Notes on Anti-Aliasing : Texture filtering (part 3)

Textures aliasing can easily produce unpleasant artifacts, some really visible, others harder to track down. In this post, I’ll try to go a bit further than ‘Just use mipmapping, or X, Y, Z methods to prevent aliasing’.

Fourier, always him…

I scratched the surface in the introduction of this series, but I’m afraid I’ll have to go a tiny bit deeper into Fourier analysis for this one. Memories of my DSP courses are vague, so I’ll try to keep this simple.

The only things I remember (unfortunately that’s true…) from my DSP courses are the following :

  • Any periodic signal can be represented as a sum of weighted sinusoids
  • The Fourier transform transforms a signal from the spatial domain into the frequency domain, and vice versa with the inverse Fourier transform
  • Convolution in the spatial domain is the same as multiplication in the frequency domain
  • Multiplication in the spatial domain is the same as convolution in the frequency domain
  • Nyquist sampling theorem : the sampling frequency has to be at least twice the maximum frequency for the original signal to be reconstructed properly

Basically, we often use these because if a problem is too hard too solve in a domain, it may be easier to solve it in the dual domain. Let’s say we have a continuous signal that we sample, and then from theses samples, we want to reconstruct the original sample. Our sampling may not be good enough to achieve a proper reconstruction (remember Nyquist !) so in this case we have 2 options : ‘simplifying’/’adapting’ the original signal by removing high frequencies from it using a low-pass filter (some kind of blur), or using a higher sampling frequency. The latter is extremely expensive for real-time application. So the former is our usual go-to solution. Choosing a good low-pass filter is key to reconstruct the input signal properly. Ideally, the best low-pass filter possible would be the one suppressing all frequencies above the maximum frequency of the signal. In the spatial domain, this filter is the sinc function. I won’t derive that result here, but you can find the proof in any decent DSP book. The problem with this ideal filter is that its spectrum is infinite. So we have to find alternatives :

  • the box filter, also known as nearest neighbour or point sampling in graphics. It produces ugly pixelization because it considers pixels as little squares, and they are not (see the article by Alvy Ray Smith linked at the end)
  • the triangle filter, also known as bilinear filtering in graphics. For textures, it computes the weighted average of the 2×2 nearest texels. So no pixelization, but still blurry.
  • the cubic filter, also known as bicubic filtering in graphics. For textures, it computes the weighted average of the 4×4 nearest texels. Almost no blur anymore.

What is mipmapping ?

Mipmapping 1 3 is a texture prefiltering technique invented by Lance Williams in 1983. The objective is to adapt the texture resolution to the screen resolution. When a texture is created with mipmap enabled, it is filtered down repeatedly with a box or triangle filter into -power of 2- smaller images. It can be seen as a way to choose the level of detail of the texture sampling. For instance, with a 512×512 mipmapped texture, level 0 is the original 512×512, level 1 is 256×256, level 2 is 128×128, and so on… On a texture of dimension w \times h, there are {\lfloor}{log_2(max(w,h))}{\rfloor} + 1 levels in the pyramid. If the input texture dimensions are not powers of 2, mipmap dimensions are always rounded down to the nearest powers of 2. In memory, mipmapped textures are rather cheap with only  \frac{1}{3}    additional space required (because \sum_{n=0}^{\infty} \frac{1}{4^{n}} = \frac{4}{3} = 1 + \frac{1}{3} ))

When a texture is applied on a surface, there are 3 possibilities :

  • One screen pixel maps exactly to one of the texels in the texture. That’s rare, but in this case, no need to filter the texture
  • Magnification : When an object is close to the camera, many screen pixels map to a few texels. This case is rather simple to handle, the texture is too small for the surface. Using a nearest neighbour point sampling would cause jaggies, and bilinear filtering would be better, but still causes overblurring. So the best solution would to create and use a bigger texture
  • Minification : When an object is far from the camera, multiple texels map to a single pixel on the screen. Minification is by far the most annoying of the three .Without any filtering, we get a high-frequency signal with a low-frequency sampling, resulting in ugly aliasing artifacts (Moiré pattern, flickering…). Mipmapping combined with bilinear interpolation gives better result, but is far from prefect.

Another problem arises when the camera is moving too fast. We would notice the mipmap level changing. To prevent that and produce a smooth transition between levels, we can use trilinear mipmapping, which is a linear interpolation between the two nearest mipmap levels (in other words, a linear interpolation between two bilinear interpolations).

How the mipmap level is chosen ?

We would expect the texture mipmap level associated to an object to be 0 when the object is really close to the camera, and maximum when the object is far away. In other words, the mipmap level to use is based on the projected size of the object on the screen.

But concretely, how to compute that ? After all, when you call texture(sampler2D , uv) in an OpenGL fragment shader, you don’t explicitely specify the mipmap level ? The way it is done is by using screen-space derivatives (dFdx and dFdy in OpenGL) of texture coordinates to compute the rate of change of texels per pixel. Because GPU treats fragments in 2×2 blocks, we can use screen-space derivatives to approximate what I’ll call from now on the pixel footprint, which is the coverage of a pixel projected to texture space. (Need a refresher on how screen-space derivatives work ? Great explanations here)

To pick the mipmap level d that most closely matches the 1 pixel to 1 texel ratio, a few methods exist (I think it’s not specified by any spec, must be implementation specific?). A common way to compute d is based on the direction of largest gradient :

d = log_2(max(\sqrt{dFdx.u^2+dFdx.v^2}, \sqrt{dFdy.u^2+dFdy.v^2})

Mathematical derivations and explanations on the different ways to compute d can be find in the Eberly book 2.

Improvements to mipmap

Mipmap is the only texture filtering technique implemented by hardware. It is extremely cheap, produces overall good results most of the time, but with camera perspective, overblurring can appear. The reason for that is that when we want to pick a mipmap level,  we approximate the pixel footprint with a square. Depending on the camera perspective, the pixel footprint can be stretched in a direction, and the approximation of the pixel footprint by a square gives wrong results. There are a few alternatives to mipmap to add anisotropy and reduce this overblurring.

Ripmap is an alternative to mipmap. With ripmapping, we downsample a base texture by powers of two in both directions, and not in one unique direction anymore. This way, we can approximate the pixel footprint with an axis-aligned rectangle.

Summed-Area Table is another method that can be used to reduce the overblurring caused by mipmap. And again, with a precomputation step, we can easily compute axis-aligned rectangles inside textures.

Elliptical weighted average (EWA) seems to be the state of the art in terms of quality – supersampling being out of the question for real-time rendering. It works by fitting an ellipse with the texture coordinates screen space derivatives as extents, and then the texture is filtered with a Gaussian filter. Of course, it is slower than other techniques. I’m not aware in any use of EWA filtering in real-time application.

Usually, and if the hardware supports it, the easiest way to add anisotropy is to use the hardware anisotropic filtering which is really expensive but looks great. We take multiple samples from within the footprint, allowing the hardware to average multiple samples together for a more accurate result. To put it simply, we approximate the pixel footprint with several smaller rectangles, and it helps a lot when the footprint is stretched by perspective.

No silver bullet…

I wrote now more than I originally planned on mipmapping, but the thing is, it is rarely applicable. For textures containing color information like albedo, it makes sense to average their content because it is linear and affects the lighting in a linear way. But assets in games today contain not only albedo, but also normal map, roughness map, metalness map… These textures contain data affecting the lighting in a non-linear way, so we have to use other texture prefiltering techniques.

With normal maps for instance, a mipmapped texture far from the camera will ‘smooth’ the normals and produce unstable specular highlights : we call that specular aliasing. That is because what we want is to shade all the pixels in the footprint, and averages to get the result. But with mipmapping, the normals are averaged and then shaded once. I wanted to write an article exclusively on specular aliasing, but I might as well talk about it here. The main papers concerning normal map filtering are the following :

  • Toskvig5. With this technique, we approximate the Blinn-Phong lobe with a 1D Gaussian. The basic idea is to generate mipmap for normal map and use the variation of normals to modify the value of the shininess exponent in Blinn Phong. The shorter the averaged normal magnitude is,  the larger the distribution of normals is, and the rougher the surface is. Simple but there are a few drawbacks : can’t be used with 2-channel and/or compressed normal map, Blinn-Phong…
  • (C)LEAN6. With LEAN mapping, normals are represented as off-center 2D Gaussian. Just in case, time for Stats101 : A 1D Gaussian is defined as a mean \mu (location) and a variance \sigma^{2} (how data is spread out around the mean). A 2D Gaussian is defined as a mean vector \mu and a 2×2 symmetric covariance matrix \Sigma . The covariance matrix gives the variance of each variable along the diagonal, and the other elements measure the correlations between the variables. End of Stats101. With LEAN Mapping, the mean direction – 2 components – is stored in a texture, and regular mipmapping works well on it. For the covariance matrix, we can’t apply regular mipmapping on it, so we store the raw second moment (moment about the origin, not the mean) – 3 components – in a texture instead. Combined with the mean direction, we can reconstruct the covariance matrix. This technique produces a good filtering, supports anisotropy, but the main drawback is we need to create 2 new textures to reconstruct the filtered normal. CLEAN Mapping solves that problem by sacrificing anisotropy in the process
  • Frequency domain filtering7 : Originally the first technique that can be used with arbitrary BRDFs. In this paper, the normal map filtering problem is seen as a spherical convolution of the NDF and the BRDF. Spherical harmonics are used to filter the NDF for diffuse/low-frequency BRDFs, while von Mises-Fisher distributions are used for specular/high-frequency BRDFs. The Siggraph course notes of The Order: 18869 contains the best explanation of it I could find or come up with myself.
  • LEADR8. Improved LEAN for displacement maps. Physically based, flexible, the state of the art, not yet used in games as far as I know, I think it will become the reference in a few years when tesselation will be more affordable.

For implementation details, the reference is MJP specularAA project, in which the main specular anti aliasing techniques used in games are implemented.

Conclusion

I hadn’t planned to write over 2k words over texture filtering, but it’s a subject I really like, so why not ? I used no images/drawings because I honesly couldn’t come up with anything better than what’s in the footnotes. I could really go on for another thousand but I’m not sure a blogpost would be the best suitable format for that. For instance, I only talked about color and normal texture filtering. But what about metalness maps ? Does it make sense to simply average two metalness values ? Does it produce visible artifacts ? I don’t know !

Texture filtering remains an open problem even though we saw there exists several techniques that could reduce aliasing significantly. The reference for texture filtering techniques is the survey written by Bruneton and Neyret4.

The next part of this series will be on MSAA/SSAA, so it should be shorter than this one. And after that, maybe a really thorough and long as a day without bread article on TAA.

Additional reading

‘Knowing which mipmap levels are needed’ by Tom Forsyth. A few head scratching moments in this article, really interesting.

‘A Pixel Is Not A Little Square’ by Alvy Ray Smith. The title says it all.

‘Gamma and Mipmapping’ by John Hable, in which he shows that mipmap should be calculated in linear space

‘Mipmapping part 1’ and  ‘Mipmapping part 2’, by Jonathan Blow, in which he tried to find better reconstruction filters


  1. ‘Real Time Rendering’ (Chapter 6) by Tomas Akenine-Moller, Eric Haines, Naty Hoffman
  2. ‘3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics’ by David H. Eberly.
  3. ”Texture filtering’ by Steve Marschner
  4. ‘A Survey of Non-linear Pre-filtering Methods for Efficient and Accurate Surface Shading’ by Eric Bruneton and Fabrice Neyret
  5. ‘Mipmapping normal maps’ by Michael Toksvig
  6. ‘LEAN Mapping’ by Marc Olano and Dan Baker
  7. ‘Frequency Domain Normal Map Filtering’ by Charles Han, Bo Sun, Ravi Ramamoorthi, Eitan Grinspun
  8. ‘Linear Efficient Antialiased Displacement and Reflectance Mapping’ by Jonathan Dupuy, Eric Heitz, Jean-Claude Iehl, Pierre Poulin, Fabrice Neyret
  9. ‘Crafting a Next-Gen Material Pipeline for The Order: 1886’ by David Neubelt and Matt Pettineo

One comment

  1. Pingback: Introduction to the Jacobian and its use in graphics « Mehdi NS


Leave a comment