Perspective Transforms
               

        By Andre Yew (andrey@gluttony.ugcs.caltech.edu)



    This is how I learned perspective transforms --- it was
intuitive and understandable to me, so perhaps it'll be to
others as well.  It does require knowledge of matrix math
and homogeneous coordinates.  IMO, if you want to write a
serious renderer, you need to know both.

   First, let's look at what we're trying to do:
               S (screen)
               |    * P (y, z)
               |   /|
               |  / |
               | /  |
               |/   |
               * R  |
             / |    |
            /  |    |
           /   |    |
   E (eye)/    |    | W
---------*-----|----*-------------
         <- d -><-z->

   E is the eye, P is the point we're trying to project, and
R is its projected position on the screen S (this is the point
you want to draw on your monitor).  Z goes into the monitor (left-
handed coordinates), with X and Y being the width and height of the
screen.  So let's find where R is:

    R = (xs, ys)

    Using similar triangles (ERS and EPW)

    xs/d = x/(z + d)
    ys/d = y/(z + d)
    (Use similar triangles to determine this)

    So,

    xs = x*d/(z + d)
    ys = y*d/(z + d)

    Express this homogeneously:

    R = (xs, ys, zs, ws).

    Make xs = x*d
         ys = y*d
         zs = 0 (the screen is a flat plane)
         ws = z + d

    and express this as a vector transformed by a matrix:

    [x y z 1][ d 0 0 0 ]
             [ 0 d 0 0 ]    =  R
             [ 0 0 0 1 ]
             [ 0 0 0 d ]

    The matrix on the right side can be called a perspective transform.
But we aren't done yet.  See the zero in the 3rd column, 3rd row of
the matrix?  Make it a 1 so we retain the z value (perhaps for some
kind of Z-buffer).  Also, this isn't exactly what we want since we'd
also like to have the eye at the origin and we'd like to specify some
kind of field-of-view.  So, let's translate the matrix (we'll call
it M) by -d to move the eye to the origin:

    [ 1 0 0  0 ][ d 0 0 0 ]
    [ 0 1 0  0 ][ 0 d 0 0 ]
    [ 0 0 1  0 ][ 0 0 1 1 ]  <--- Remember, we put a 1 in (3,3) to
    [ 0 0 -d 1 ][ 0 0 0 d ]       retain the z part of the vector.

    And we get:

    [ d 0 0  0 ]
    [ 0 d 0  0 ]
    [ 0 0 1  1 ]
    [ 0 0 -d 0 ]

    Now parametrize d by the angle PEW, which is half the field-of-view
(FOV/2).  So we now want to pick a d such that ys = 1 always and we get
a nice relationship:

    d = cot( FOV/2 )

    Or, to put it another way, using this formula, ys = 1 always.

    Replace all the d's in the last perspective matrix and multiply
through by sin's:

    [ cos 0   0    0   ]
    [ 0   cos 0    0   ]
    [ 0   0   sin  sin ]
    [ 0   0   -cos 0   ]

    With all the trig functions taking FOV/2 as their arguments.
Let's refine this a little further and add near and far Z-clipping
planes.  Look at the lower right 2x2 matrix:

   [ sin sin ]
   [-cos 0   ]

   and replace the first column by a and b:

   [ a sin ]
   [ b 0   ]
   [ b 0   ]

   Transform out near and far boundaries represented homogeneously
as (zn, 1), (zf, 1), respectively and we get:

   (zn*a + b, zn*sin) and (zf*a + b, zf*sin).

   We want the transformed boundaries to map to 0 and 1, respectively,
so divide out the homogeneous parts to get normal coordinates and equate:

    (zn*a + b)/(zn*sin) = 0 (near plane)
    (zf*a + b)/(zf*sin) = 1 (far plane)

   Now solve for a and b and we get:

   a = (zf*sin)/(zf - zn)
     = sin/(1 - zn/zf)
   b = -a*zn
   b = -a*zn

   At last we have the familiar looking perspective transform matrix:

   [ cos( FOV/2 ) 0                        0            0 ]
   [ 0            cos( FOV/2 )             0            0 ]
   [ 0            0 sin( FOV/2 )/(1 - zn/zf) sin( FOV/2 ) ]
   [ 0            0                    -a*zn            0 ]

   There are some pretty neat properties of the matrix.  Perhaps
the most interesting is how it transforms objects that go through
the camera plane, and how coupled with a clipper set up the right
way, it does everything correctly.  What's interesting about this
is how it warps space into something called Moebius space, which
is kind of like a fortune-cookie except the folds pass through
each other to connect the lower folds --- you really have to see
it to understand it.  Try feeding it some vectors that go off to
infinity in various directions (ws = 0) and see where they come
out.