Stannum

Sony ARW distortion correction

2018-05-09

Raw output of Sony cameras is saved in its proprietary ARW format. It is a TIFF-based format that stores lots of undocumented and partially encrypted metadata in its TIFF directories. Some have already tried to reverse engineer the format[1][2], but most of it still remain undocumented.

Here I summarize my findings of how the distortion correction is applied. The following is my best guesses based on the ILCE-7RM2 camera I have.

Overscan

The first thing to notice is that the camera produces JPEGs of 7952×5304 pixels in size, whereas the ARW contains a 8000×5320 raster. One source of discrepancy is the 32 pixels of overscan on the right; they simply repeat the rightmost Bayer cell 16 times.

A crop from the right edge of a raw image, featuring overscan.

Cropping the overscan yields a 7968×5320 raster. This corresponds to the Sony­Image­Width­Max and Sony­Image­Height­Max tags reported by ExifTool. I hypothesize that these pixels do not come from physical photosites on the sensor, but are rather a result of the way the sensor is read-out. This is crucial because it suggests that the optical center, for the purpose of radial corrections, is located at (3984, 2660) on the ARW rather than at (4000, 2660).

Next, if distortion correction is disabled, the final JPEG is simply a crop of the central 7952×5304 pixels around the optical center. This corresponds to Sony­Image­Width and Sony­Image­Height, whose values are duplicated in the EXIF tags. The border of the extra eight pixels around the edges is essential for proper demosaicing and resampling near the image edges.

Undistortion

The distortion correction coefficients are stored in the Distortion­Corr­Params tag (in both SR2 directory and the RAW directory) as a sequence of seventeen 16-bit signed integers. The first in the sequence is the number of the following valid coefficients. For example:

DistortionCorrParams : 16 -5 -1 4 11 18 26 35 43 49 54 55 51 40 20 -13 -60

Graphing them gives a curve like this:

This looks like knots of a spline to me. In order to confirm this hypothesis and figure out the exact interpretation of that spline, I went on recovering the actual correction curve by matching between the ARW and the in-camera JPEG.

In detail, I assume that the mapping from a point P on a corrected JPEG to the distorted ARW is done by:

distorted(P) = (P − Cjpeg) (1 + f(ǁP − Cjpegǁ)) + Carw

where C’s are the optical centers and f(R) is a function I’m trying to figure out.

To do that I:

  1. Demosaic the ARW and extract the green channel from it and the JPEG.
  2. Cut the JPEG image into concentric rings of about 32 pixels thick, remapped to conformal[3] polar coordinates. For each of these 32-pixel sections:
  3. Extract an annular neighborhood from the ARW, also conformally remapped.
  4. Template match the JPEG section onto the ARW section. I use normalized correlation as a score and apply a sub-pixel precision maximum refinement by a local quadratic approximation.
  5. Convert the previously found offset back into a multiplicative factor, subtract one.

Plotting the result gives the following graph of f(R):

This curve is remarkably similar to the plot of DistortionCorrParams. To match the two curves I apply three adjustments:

  • X scale: I guess that the knots of the spline are spread evenly between the center and the corner of the frame. Therefore I scale the x-axis of my estimate to the [0, 15] range.
  • Y scale: The vertical scale of the Sony curve is divided by 16384. It is a power of two which seems to match the data.
  • Y offset: My estimates of f are always negative. It is also visible that the JPEGs are always ‘zoomed-in’, so that Sony’s analogue of f(R) is also always negative. After all, it makes sense for f(R) to be negative because it prevents sampling from outside the captured image. Therefore I shift the Sony curve downwards by subtracting its maximum.

Here are the two curves after the above adjustments:

Here is a comparison from two other photographs:

The fits aren't perfect, and I'm still hoping to figure out why (aside of the noise).

APS-C mode

In the APS-C mode the camera reads a smaller part of the sensor. Nevertheless, the process of cropping the overscan and undistorting seems to be identical. Now the ARW size is 5216×3464, after the overscan crop it’s reduced to 5184×3464, and the final JPEG size is 5168×3448, all matching the mentioned tags.

The only major difference is that the camera writes now only 11 coefficients into DistortionCorrParams. At first I thought that the spline knots are located at the same radii, measured in pixels, as in the Full-Frame case. However, it doesn’t seem to be the case. Instead if I spread them uniformly from the center to the corner of the undirstorted image, it matches the JPEG better.

Conclusion

The process described above gives good results, but it does not exactly match the in-camera JPEGs. There is still a visible radial offset of up to a pixel in some regions. It might be attributed to a different spline interpolation methods, chromatic aberration correction I didn’t account for, or other unknown factors.

Footnotes

  1. Dave Coffin of dcraw figured out the decryption of the SR2 subdirectory.
  2. See Summary: Sony’s ARW (v2.3.1) embedded lens-correction data by variousphotography for previous lens-correction reverse engineering attempts.
  3. That is as-if by the complex Log. This is important because we try to fit a multiplicative factor.

Share on

See also