© 2023 Matt Pharr, Wenzel Jacob, and Greg Humphreys
This work is subject to a Creative Commons CC-BY-ND-NC license.
Subject to such license, all rights are reserved.
The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.
This book was set in Minion, East Bloc ICG Open, and Univers by Windfall Software. Printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Names: Pharr, Matt, author. | Jakob, Wenzel, author. | Humphreys, Greg, author.
Title: Physically based rendering : from theory to implementation / Matt Pharr, Wenzel Jakob, Greg Humphreys.
Description: Fourth edition. | Cambridge : The MIT Press, [2023] | Includes bibliographical references and index.
Identifiers: LCCN 2022014718 (print) | LCCN 2022014719 (ebook) |
ISBN 9780262048026 | ISBN 9780262374033 (epub) | ISBN 9780262374040 (pdf)
Subjects: LCSH: Computer graphics. | Three-dimensional display systems. |
Image processing–Digital techniques.
Classification: LCC T385 .P486 2022 (print) | LCC T385 (ebook) |
DDC 006.6–dc23/eng/20220919
LC record available at https://lccn.loc.gov/2022014718
LC ebook record available at https://lccn.loc.gov/2022014719
10 9 8 7 6 5 4 3 2 1
d_r0
ABOUT THE AUTHORS
Matt Pharr is a Distinguished Research Scientist at NVIDIA. He has previously worked at Google, co-founded Neoptica, which was acquired by Intel, and co-founded Exluna, which was acquired by NVIDIA. He has a B.S. degree from Yale and a Ph.D. from the Stanford Graphics Lab, where he worked under the supervision of Pat Hanrahan.
Wenzel Jakob is an assistant professor in the School of Computer and Communication Sciences at École Polytechnique Fédérale de Lausanne (EPFL). His research revolves around inverse and differentiable graphics, material appearance modeling, and physically based rendering. Wenzel obtained his Ph.D. at Cornell University under the supervision of Steve Marschner, after which he joined ETH Zürich for postdoctoral studies under the supervision of Olga Sorkine Hornung. Wenzel is also the lead developer of the Mitsuba renderer, a research-oriented rendering system.
Greg Humphreys is currently an engineer at a stealth startup. He has also been part of the Chrome graphics team at Google and the OptiX GPU ray-tracing team at NVIDIA. In a former life, he was a professor of Computer Science at the University of Virginia, where he conducted research in both high-performance and physically based computer graphics, as well as computer architecture and visualization. Greg has a B.S.E. degree from Princeton and a Ph.D. in Computer Science from Stanford under the supervision of Pat Hanrahan. When he’s not tracing rays, Greg can usually be found playing tournament bridge.
Contents
1.1.1 Indexing and Cross-Referencing
1.2 Photorealistic Rendering and the Ray-Tracing Algorithm
1.2.2 Ray–Object Intersections
1.2.5 Light Scattering at Surfaces
1.2.6 Indirect Light Transport
1.3.4 ImageTileIntegrator and the Main Rendering Loop
1.3.5 RayIntegrator Implementation
1.4 How to Proceed through This Book
1.5 Using and Understanding the Code
1.5.1 Source Code Organization
1.5.4 Abstraction versus Efficiency
1.5.10 Parallelism and Thread Safety
1.6 A Brief History of Physically Based Rendering
CHAPTER 02. MONTE CARLO INTEGRATION
2.1.1 Background and Probability Review
2.1.3 The Monte Carlo Estimator
2.1.4 Error in Monte Carlo Estimators
2.2.3 Multiple Importance Sampling
2.3 Sampling Using the Inversion Method
2.4 Transforming between Distributions
2.4.1 Transformation in Multiple Dimensions
2.4.2 Sampling with Multidimensional Transformations
CHAPTER 03. GEOMETRY AND TRANSFORMATIONS
3.1.1 Coordinate System Handedness
3.3.1 Normalization and Vector Length
3.3.3 Coordinate System from a Vector
3.8.3 Spherical Parameterizations
3.9.2 Transform Class Definition
3.9.6 x, y, and z Axis Rotations
3.9.7 Rotation around an Arbitrary Axis
3.9.8 Rotating One Vector to Another
3.9.9 The Look-at Transformation
3.10.6 Composition of Transformations
3.10.7 Transformations and Coordinate System Handedness
3.10.9 Animating Transformations
CHAPTER 04. RADIOMETRY, SPECTRA, AND COLOR
4.1.2 Incident and Exitant Radiance Functions
4.1.3 Radiometric Spectral Distributions
4.1.4 Luminance and Photometry
4.2 Working with Radiometric Integrals
4.2.1 Integrals over Projected Solid Angle
4.2.2 Integrals over Spherical Coordinates
4.5 Representing Spectral Distributions
4.5.2 General Spectral Distributions
4.5.4 Sampled Spectral Distributions
4.6.5 Choosing the Number of Wavelength Samples
5.1.1 Camera Coordinate Spaces
5.2.3 The Thin Lens Model and Depth of Field
5.4.1 The Camera Measurement Equation
5.4.2 Modeling Sensor Response
5.4.5 Common Film Functionality
6.1.2 Ray–Bounds Intersections
6.1.4 Intersection Coordinate Spaces
6.5.1 Mesh Representation and Storage
6.5.3 Ray–Triangle Intersection
6.8.1 Floating-Point Arithmetic
6.8.2 Conservative Ray–Bounds Intersections
6.8.3 Accurate Quadratic Discriminants
6.8.4 Robust Triangle Intersections
6.8.5 Bounding Intersection Point Error
6.8.6 Robust Spawned Ray Origins
6.8.7 Avoiding Intersections behind Ray Origins
CHAPTER 07. PRIMITIVES AND INTERSECTION ACCELERATION
7.1 Primitive Interface and Geometric Primitives
7.1.2 Object Instancing and Primitives in Motion
7.3 Bounding Volume Hierarchies
7.3.2 The Surface Area Heuristic
7.3.3 Linear Bounding Volume Hierarchies
7.3.4 Compact BVH for Traversal
7.3.5 Bounding and Intersection Tests
CHAPTER 08. SAMPLING AND RECONSTRUCTION
8.1.1 The Frequency Domain and the Fourier Transform
8.1.2 Ideal Sampling and Reconstruction
8.1.5 Sampling and Aliasing in Rendering
8.1.6 Spectral Analysis of Sampling Patterns
* 8.2.1 Fourier Analysis of Variance
8.2.2 Low Discrepancy and Quasi Monte Carlo
8.6.1 Hammersley and Halton Points
8.6.2 Randomization via Scrambling
8.6.3 Halton Sampler Implementation
8.7.1 Stratification over Elementary Intervals
8.7.2 Randomization and Scrambling
8.7.6 Blue Noise Sobol Sampler
9.1.1 Geometric Setting and Conventions
9.1.3 Hemispherical Reflectance
9.1.4 Delta Distributions in BSDFs
9.3 Specular Reflection and Transmission
9.3.3 The Law of Specular Reflection
9.3.6 The Fresnel Equations for Conductors
* 9.5.2 Non-Symmetric Scattering and Refraction
9.6 Roughness Using Microfacet Theory
9.6.1 The Microfacet Distribution
9.6.3 The Masking-Shadowing Function
9.6.4 Sampling the Distribution of Visible Normals
9.6.5 The Torrance–Sparrow Model
9.9.6 Scattering Model Evaluation
9.9.8 Hair Absorption Coefficients
CHAPTER 10. TEXTURES AND MATERIALS
10.1 Texture Sampling and Antialiasing
10.1.1 Finding the Texture Sampling Rate
10.1.2 Ray Differentials at Medium Transitions
* 10.1.3 Ray Differentials for Specular Reflection and Transmission
10.1.4 Filtering Texture Functions
10.2 Texture Coordinate Generation
10.3 Texture Interface and Basic Textures
10.4.1 Texture Memory Management
10.4.2 Image Texture Evaluation
10.5 Material Interface and Implementations
10.5.1 Material Implementations
10.5.2 Finding the BSDF at a Surface
11.1 Volume Scattering Processes
11.1.3 Out Scattering and Attenuation
11.3.1 The Henyey–Greenstein Phase Function
12.1.1 Photometric Light Specification
12.2.2 Texture Projection Lights
12.2.3 Goniophotometric Diagram Lights
12.5.1 Uniform Infinite Lights
* 12.5.3 Portal Image Infinite Lights
CHAPTER 13. LIGHT TRANSPORT I: SURFACE REFLECTION
13.1 The Light Transport Equation
13.1.2 Analytic Solutions to the LTE
13.1.3 The Surface Form of the LTE
13.1.5 Delta Distributions in the Integrand
13.1.6 Partitioning the Integrand
13.2.3 Incremental Path Construction
CHAPTER 14. LIGHT TRANSPORT II: VOLUME RENDERING
14.1.1 Null-Scattering Extension
14.1.2 Evaluating the Equation of Transfer
14.1.3 Sampling the Majorant Transmittance
* 14.1.4 Generalized Path Space
* 14.1.5 Evaluating the Volumetric Path Integral
14.2 Volume Scattering Integrators
14.2.1 A Simple Volumetric Integrator
* 14.2.2 Improving the Sampling Techniques
* 14.2.3 Improved Volumetric Integrator
14.3 Scattering from Layered Materials
14.3.1 The One-Dimensional Equation of Transfer
14.3.3 Coated Diffuse and Coated Conductor Materials
* CHAPTER 15. WAVEFRONT RENDERING ON GPUS
15.1 Mapping Path Tracing to the GPU
15.1.2 Structuring Rendering Computation
15.2 Implementation Foundations
15.2.1 Execution and Memory Space Specification
15.2.2 Launching Kernels on the GPU
15.2.3 Structure-of-Arrays Layout
15.3 Path Tracer Implementation
CHAPTER 16. RETROSPECTIVE AND THE FUTURE
16.2.2 Preshaded Micropolygon Grids
16.2.4 Interactive and Animation Rendering
16.2.5 Specialized Compilation
16.3.1 Inverse and Differentiable Rendering
16.3.2 Machine Learning and Rendering
APPENDIXES
C PROCESSING THE SCENE DESCRIPTION
INDEX OF CLASSES AND THEIR MEMBERS
INDEX OF MISCELLANEOUS IDENTIFIERS
_________________
* An asterisk denotes a section with advanced content that can be skipped on a first reading.
[Just as] other information should be available to those who want to learn and understand, program source code is the only means for programmers to learn the art from their predecessors. It would be unthinkable for playwrights not to allow other playwrights to read their plays [or to allow them] at theater performances where they would be barred even from taking notes. Likewise, any good author is well read, as every child who learns to write will read hundreds of times more than it writes. Programmers, however, are expected to invent the alphabet and learn to write long novels all on their own. Programming cannot grow and learn unless the next generation of programmers has access to the knowledge and information gathered by other programmers before them. —Erik Naggum
Rendering is a fundamental component of computer graphics. At the highest level of abstraction, rendering is the process of converting a description of a three-dimensional scene into an image. Algorithms for animation, geometric modeling, texturing, and other areas of computer graphics all must pass their results through some sort of rendering process so that they can be made visible in an image. Rendering has become ubiquitous; from movies to games and beyond, it has opened new frontiers for creative expression, entertainment, and visualization.
In the early years of the field, research in rendering focused on solving fundamental problems such as determining which objects are visible from a given viewpoint. As effective solutions to these problems have been found and as richer and more realistic scene descriptions have become available thanks to continued progress in other areas of graphics, modern rendering has grown to include ideas from a broad range of disciplines, including physics and astrophysics, astronomy, biology, psychology and the study of perception, and pure and applied mathematics. The interdisciplinary nature of rendering is one of the reasons that it is such a fascinating area of study.
This book presents a selection of modern rendering algorithms through the documented source code for a complete rendering system. Nearly all of the images in this book, including the one on the front cover, were rendered by this software. All of the algorithms that came together to generate these images are described in these pages. The system, pbrt, is written using a programming methodology called literate programming that mixes prose describing the system with the source code that implements it. We believe that the literate programming approach is a valuable way to introduce ideas in computer graphics and computer science in general. Often, some of the subtleties of an algorithm can be unclear or hidden until it is implemented, so seeing an actual implementation is a good way to acquire a solid understanding of that algorithm’s details. Indeed, we believe that deep understanding of a number of carefully selected algorithms in this manner provides a better foundation for further study of computer graphics than does superficial understanding of many.
In addition to clarifying how an algorithm is implemented in practice, presenting these algorithms in the context of a complete and nontrivial software system also allows us to address issues in the design and implementation of medium-sized rendering systems. The design of a rendering system’s basic abstractions and interfaces has substantial implications for both the elegance of the implementation and the ability to extend it later, yet the trade-offs in this design space are rarely discussed.
pbrt and the contents of this book focus exclusively on photorealistic rendering, which can be defined variously as the task of generating images that are indistinguishable from those that a camera would capture in a photograph or as the task of generating images that evoke the same response from a human observer as looking at the actual scene. There are many reasons to focus on photorealism. Photorealistic images are crucial for special effects in movies because computer-generated imagery must often be mixed seamlessly with footage of the real world. In applications like computer games where all of the imagery is synthetic, photorealism is an effective tool for making the observer forget that he or she is looking at an environment that does not actually exist. Finally, photorealism gives a reasonably well-defined metric for evaluating the quality of the rendering system’s output.
AUDIENCE
There are three main audiences that this book is intended for. The first is students in graduate or upper-level undergraduate computer graphics classes. This book assumes existing knowledge of computer graphics at the level of an introductory college-level course, although certain key concepts such as basic vector geometry and transformations will be reviewed here. For students who do not have experience with programs that have tens of thousands of lines of source code, the literate programming style gives a gentle introduction to this complexity. We pay special attention to explaining the reasoning behind some of the key interfaces and abstractions in the system in order to give these readers a sense of why the system is structured in the way that it is.
The second audience is advanced graduate students and researchers in computer graphics. For those doing research in rendering, the book provides a broad introduction to the area, and the pbrt source code provides a foundation that can be useful to build upon (or at least to use bits of source code from). For those working in other areas of computer graphics, we believe that having a thorough understanding of rendering can be helpful context to carry along.
Our final audience is software developers in industry. Although many of the basic ideas in this book will be familiar to this audience, seeing explanations of the algorithms presented in the literate style may lead to new perspectives. pbrt also includes carefully crafted and debugged implementations of many algorithms that can be challenging to implement correctly; these should be of particular interest to experienced practitioners in rendering. We hope that delving into one particular organization of a complete and nontrivial rendering system will also be thought provoking to this audience.
OVERVIEW AND GOALS
pbrt is based on the ray-tracing algorithm. Ray tracing is an elegant technique that has its origins in lens making; Carl Friedrich Gauß traced rays through lenses by hand in the 19th century. Ray-tracing algorithms on computers follow the path of infinitesimal rays of light through the scene until they intersect a surface. This approach gives a simple method for finding the first visible object as seen from any particular position and direction and is the basis for many rendering algorithms.
pbrt was designed and implemented with three main goals in mind: it should be complete, it should be illustrative, and it should be physically based.
Completeness implies that the system should not lack key features found in high-quality commercial rendering systems. In particular, it means that important practical issues, such as antialiasing, robustness, numerical precision, and the ability to efficiently render complex scenes should all be addressed thoroughly. It is important to consider these issues from the start of the system’s design, since these features can have subtle implications for all components of the system and can be quite difficult to retrofit into the system at a later stage of implementation.
Our second goal means that we tried to choose algorithms, data structures, and rendering techniques with care and with an eye toward readability and clarity. Since their implementations will be examined by more readers than is the case for other rendering systems, we tried to select the most elegant algorithms that we were aware of and implement them as well as possible. This goal also required that the system be small enough for a single person to understand completely. We have implemented pbrt using an extensible architecture, with the core of the system implemented in terms of a set of carefully designed interface classes, and as much of the specific functionality as possible in implementations of these interfaces. The result is that one does not need to understand all of the specific implementations in order to understand the basic structure of the system. This makes it easier to delve deeply into parts of interest and skip others, without losing sight of how the overall system fits together.
There is a tension between the two goals of being complete and being illustrative. Implementing and describing every possible useful technique would not only make this book unacceptably long, but would also make the system prohibitively complex for most readers. In cases where pbrt lacks a particularly useful feature, we have attempted to design the architecture so that the feature could be added without altering the overall system design.
The basic foundations for physically based rendering are the laws of physics and their mathematical expression. pbrt was designed to use the correct physical units and concepts for the quantities it computes and the algorithms it implements. pbrt strives to compute images that are physically correct; they accurately reflect the lighting as it would be in a real-world version of the scene.1 One advantage of the decision to use a physical basis is that it gives a concrete standard of program correctness: for simple scenes, where the expected result can be computed in closed form, if pbrt does not compute the same result, we know there must be a bug in the implementation. Similarly, if different physically based lighting algorithms in pbrt give different results for the same scene, or if pbrt does not give the same results as another physically based renderer, there is certainly an error in one of them. Finally, we believe that this physically based approach to rendering is valuable because it is rigorous. When it is not clear how a particular computation should be performed, physics gives an answer that guarantees a consistent result.
Efficiency was given lower priority than these three goals. Since rendering systems often run for many minutes or hours in the course of generating an image, efficiency is clearly important. However, we have mostly confined ourselves to algorithmic efficiency rather than low-level code optimization. In some cases, obvious micro-optimizations take a backseat to clear, well-organized code, although we did make some effort to optimize the parts of the system where most of the computation occurs.
In the course of presenting pbrt and discussing its implementation, we hope to convey some hard-learned lessons from years of rendering research and development. There is more to writing a good renderer than stringing together a set of fast algorithms; making the system both flexible and robust is a difficult task. The system’s performance must degrade gracefully as more geometry or light sources are added to it or as any other axis of complexity is stressed.
The rewards for developing a system that addresses all these issues are enormous—it is a great pleasure to write a new renderer or add a new feature to an existing renderer and use it to create an image that could not be generated before. Our most fundamental goal in writing this book was to bring this opportunity to a wider audience. Readers are encouraged to use the system to render the example scenes in the pbrt software distribution as they progress through the book. Exercises at the end of each chapter suggest modifications to the system that will help clarify its inner workings and more complex projects to extend the system by adding new features.
The website for this book is located at pbrt.org. This site includes links to the pbrt source code, scenes that can be downloaded to render with pbrt, and a bug tracker, as well as errata. Any errors in this text that are not listed in the errata can be reported to the email address authors@pbrt.org. We greatly value your feedback!
CHANGES BETWEEN THE FIRST AND SECOND EDITIONS
Six years passed between the publication of the first edition of this book in 2004 and the second edition in 2010. In that time, thousands of copies of the book were sold, and the pbrt software was downloaded thousands of times from the book’s website. The pbrt user base gave us a significant amount of feedback and encouragement, and our experience with the system guided many of the decisions we made in making changes between the version of pbrt presented in the first edition and the version in the second edition. In addition to a number of bug fixes, we also made several significant design changes and enhancements:
CHANGES BETWEEN THE SECOND AND THIRD EDITIONS
With the passage of another six years, it was time to update and extend the book and the pbrt system. We continued to learn from readers’ and users’ experiences to better understand which topics were most useful to cover. Further, rendering research continued apace; many parts of the book were due for an update to reflect current best practices. We made significant improvements on a number of fronts:
Many other parts of the system were improved and updated to reflect progress in the field: microfacet reflection models were treated in more depth, with much better sampling techniques; a new “curve” shape was added for modeling hair and other fine geometry; and a new camera model that simulates realistic lens systems was made available. Throughout the book, we made numerous smaller changes to more clearly explain and illustrate the key concepts in physically based rendering systems like pbrt.
CHANGES BETWEEN THE THIRD AND FOURTH EDITIONS
Innovation in rendering algorithms has shown no sign of slowing down, and so in 2019 we began focused work on a fourth edition of the text. Not only does almost every chapter include substantial additions, but we have updated the order of chapters and ideas introduced, bringing Monte Carlo integration and the basic ideas of path tracing to the fore rather than saving them for the end.
Capabilities of the system that have seen especially significant improvements include:
The system has seen numerous other improvements and additions, including a new bilinear patch shape, many updates to the sample-generation algorithms that are at the heart of Monte Carlo integration, support for outputting auxiliary information at each pixel about the visible surface geometry and reflection properties, and many more small improvements to the system.
ACKNOWLEDGMENTS
Pat Hanrahan has contributed to this book in more ways than we could hope to acknowledge; we owe a profound debt to him. He tirelessly argued for clean interfaces and finding the right abstractions to use throughout the system, and his understanding of and approach to rendering deeply influenced its design. His willingness to use pbrt and this manuscript in his rendering course at Stanford was enormously helpful, particularly in the early years of its life when it was still in very rough form; his feedback throughout this process has been crucial for bringing the text to its current state. Finally, the group of people that Pat helped assemble at the Stanford Graphics Lab, and the open environment that he fostered, made for an exciting, stimulating, and fertile environment. Matt and Greg both feel extremely privileged to have been there.
We owe a debt of gratitude to the many students who used early drafts of this book in courses at Stanford and the University of Virginia between 1999 and 2004. These students provided an enormous amount of feedback about the book and pbrt. The teaching assistants for these courses deserve special mention: Tim Purcell, Mike Cammarano, Ian Buck, and Ren Ng at Stanford, and Nolan Goodnight at Virginia. A number of students in those classes gave particularly valuable feedback and sent bug reports and bug fixes; we would especially like to thank Evan Parker and Phil Beatty. A draft of the manuscript of this book was used in classes taught by Bill Mark and Don Fussell at the University of Texas, Austin, and Raghu Machiraju at Ohio State University; their feedback was invaluable, and we are grateful for their adventurousness in incorporating this system into their courses, even while it was still being edited and revised.
Matt Pharr would like to acknowledge colleagues and co-workers in rendering-related endeavors who have been a great source of education and who have substantially influenced his approach to writing renderers and his understanding of the field. Particular thanks go to Craig Kolb, who provided a cornerstone of Matt’s early computer graphics education through the freely available source code to the rayshade ray-tracing system, and Eric Veach, who has also been generous with his time and expertise. Thanks also to Doug Shult and Stan Eisenstat for formative lessons in mathematics and computer science during high school and college, respectively, and most important to Matt’s parents, for the education they have provided and continued encouragement along the way. Finally, thanks to NVIDIA for supporting the preparation of both the first and this latest edition of the book; at NVIDIA, thanks to Nick Triantos and Jayant Kolhe for their support through the final stages of the preparation of the first edition and thanks to Aaron Lefohn, David Luebke, and Bill Dally for their support of work on the fourth edition.
Greg Humphreys is very grateful to all the professors and TAs who tolerated him when he was an undergraduate at Princeton. Many people encouraged his interest in graphics, specifically Michael Cohen, David Dobkin, Adam Finkelstein, Michael Cox, Gordon Stoll, Patrick Min, and Dan Wallach. Doug Clark, Steve Lyon, and Andy Wolfe also supervised various independent research boondoggles without even laughing once. Once, in a group meeting about a year-long robotics project, Steve Lyon became exasperated and yelled, “Stop telling me why it can’t be done, and figure out how to do it!”—an impromptu lesson that will never be forgotten. Eric Ristad fired Greg as a summer research assistant after his freshman year (before the summer even began), pawning him off on an unsuspecting Pat Hanrahan and beginning an advising relationship that would span 10 years and both coasts. Finally, Dave Hanson taught Greg that literate programming was a great way to work and that computer programming can be a beautiful and subtle art form.
Wenzel Jakob was excited when the first edition of pbrt arrived in his mail during his undergraduate studies in 2004. Needless to say, this had a lasting effect on his career—thus Wenzel would like to begin by thanking his co-authors for inviting him to become a part of the third and fourth editions of this book. Wenzel is extremely indebted to Steve Marschner, who was his Ph.D. advisor during a fulfilling five years at Cornell University. Steve brought him into the world of research and remains a continuous source of inspiration. Wenzel is also thankful for the guidance and stimulating research environment created by the other members of the graphics group, including Kavita Bala, Doug James, and Bruce Walter. Wenzel spent a wonderful postdoc with Olga Sorkine Hornung, who introduced him to geometry processing. Olga’s support for Wenzel’s involvement in the third edition of this book is deeply appreciated.
We would especially like to thank the reviewers who read drafts in their entirety; all had insightful and constructive feedback about the manuscript at various stages of its progress. For providing feedback on both the first and second editions of the book, thanks to Ian Ashdown, Per Christensen, Doug Epps, Dan Goldman, Eric Haines, Erik Reinhard, Pete Shirley, Peter-Pike Sloan, Greg Ward, and a host of anonymous reviewers. For the second edition, thanks to Janne Kontkanen, Bill Mark, Nelson Max, and Eric Tabellion. For the fourth edition, we are grateful to Thomas Müller and Per Christensen, who both offered extensive feedback that has measurably improved the final version.
Many experts have kindly explained subtleties in their work to us and guided us to best practices. For the first and second editions, we are also grateful to Don Mitchell, for his help with understanding some of the details of sampling and reconstruction; Thomas Kollig and Alexander Keller, for explaining the finer points of low-discrepancy sampling; Christer Ericson, who had a number of suggestions for improving our kd-tree implementation; and Christophe Hery and Eugene d’Eon for helping us with the nuances of subsurface scattering.
For the third edition, we would especially like to thank Leo Grünschloß for reviewing our sampling chapter; Alexander Keller for suggestions about topics for that chapter; Eric Heitz for extensive help with details of microfacets and reviewing our text on that topic; Thiago Ize for thoroughly reviewing the text on floating-point error; Tom van Bussel for reporting a number of errors in our BSSRDF code; Ralf Habel for reviewing our BSSRDF text; and Toshiya Hachisuka and Anton Kaplanyan for extensive review and comments about our light transport chapters.
For the fourth edition, thanks to Alejandro Conty Estevez for reviewing our treatment of many-light sampling; Eugene d’Eon, Bailey Miller, and Jan Novák for comments on the volumetric scattering chapters; Eric Haines, Simon Kallweit, Martin Stich, and Carsten Wächter for reviewing the chapter on GPU rendering; Karl Li for feedback on a number of chapters; Tzu-Mao Li for his review of our discussion of inverse and differentiable rendering; Fabrice Rousselle for feedback on machine learning and rendering; and Gurprit Singh for comments on our discussion of Fourier analysis of Monte Carlo integration. We also appreciate extensive comments and suggestions from Jeppe Revall Frisvad on pbrt’s treatment of reflection models in previous editions.
For improvements to pbrt’s implementation in this edition, thanks to Pierre Moreau for his efforts in debugging pbrt’s GPU support on Windows and to Jim Price, who not only found and fixed numerous bugs in the early release of pbrt’s source code, but who also contributed a better representation of chromatic volumetric media than our original implementation. We are also very appreciative of Anders Langlands and Luca Fascione of Weta Digital for providing an implementation of their PhysLight system, which has been incorporated into pbrt’s PixelSensor class and light source implementations.
Many people have reported errors in the text of previous editions or bugs in pbrt. We’d especially like to thank Solomon Boulos, Stephen Chenney, Per Christensen, John Danks, Mike Day, Kevin Egan, Volodymyr Kachurovskyi, Kostya Smolenskiy, Ke Xu, and Arek Zimny, who have been especially prolific.
For their suggestions and bug reports, we would also like to thank Rachit Agrawal, Frederick Akalin, Thomas de Bodt, Mark Bolstad, Brian Budge, Jonathon Cai, Bryan Catanzaro, Tzu-Chieh Chang, Mark Colbert, Yunjian Ding, Tao Du, Marcos Fajardo, Shaohua Fan, Luca Fascione, Etienne Ferrier, Nigel Fisher, Jeppe Revall Frisvad, Robert G. Graf, Asbjørn Heid, Steve Hill, Wei-Feng Huang, John “Spike” Hughes, Keith Jeffery, Greg Johnson, Aaron Karp, Andrew Kensler, Alan King, Donald Knuth, Martin Kraus, Chris Kulla, Murat Kurt, Larry Lai, Morgan McGuire, Craig McNaughton, Don Mitchell, Swaminathan Narayanan, Anders Nilsson, Jens Olsson, Vincent Pegoraro, Srinath Ravichandiran, Andy Selle, Sébastien Speierer, Nils Thuerey, Eric Veach, Ingo Wald, Zejian Wang, Xiong Wei, Wei-Wei Xu, Tizian Zeltner, and Matthias Zwicker. Finally, we would like to thank the LuxRender developers and the LuxRender community, particularly Terrence Vergauwen, Jean-Philippe Grimaldi, and Asbjørn Heid; it has been a delight to see the rendering system they have built from pbrt’s foundation, and we have learned from reading their source code and implementations of new rendering algorithms.
Special thanks to Martin Preston and Steph Bruning from Framestore for their help with our being able to use a frame from Gravity (image courtesy of Warner Bros. and Framestore), and to Weta Digital for their help with the frame from Alita: Battle Angel (© 2018 Twentieth Century Fox Film Corporation, All Rights Reserved).
PRODUCTION
For the production of the first edition, we would also like to thank our editor Tim Cox for his willingness to take on this slightly unorthodox project and for both his direction and patience throughout the process. We are very grateful to Elisabeth Beller (project manager), who went well beyond the call of duty for the book; her ability to keep this complex project in control and on schedule was remarkable, and we particularly thank her for the measurable impact she had on the quality of the final result. Thanks also to Rick Camp (editorial assistant) for his many contributions along the way. Paul Anagnostopoulos and Jacqui Scarlott at Windfall Software did the book’s composition; their ability to take the authors’ homebrew literate programming file format and turn it into high-quality final output while also juggling the multiple unusual types of indexing we asked for is greatly appreciated. Thanks also to Ken DellaPenta (copyeditor) and Jennifer McClain (proofreader), as well as to Max Spector at Chen Design (text and cover designer) and Steve Rath (indexer).
For the second edition, we would like to thank Greg Chalson, who talked us into expanding and updating the book; Greg also ensured that Paul Anagnostopoulos at Windfall Software would again do the book’s composition. We would like to thank Paul again for his efforts in working with this book’s production complexity. Finally, we would also like to thank Todd Green, Paul Gottehrer, and Heather Scherer at Elsevier.
For the third edition, we would like to thank Todd Green, who oversaw that go-round, and Amy Invernizzi, who kept the train on the rails throughout that process. We were delighted to have Paul Anagnostopoulos at Windfall Software part of this process for a third time; his efforts have been critical to the book’s high production value, which is so important to us.
The fourth edition saw us moving to MIT Press; many thanks to Elizabeth Swayze for her enthusiasm for bringing us on board, guidance through the production process, and ensuring that Paul Anagnostopoulos would again handle composition. Our deepest thanks to Paul for coming back for one more edition with us, and many thanks as well to MaryEllen Oliver for her superb work on copyediting and proofreading.
SCENES, MODELS, AND DATA
Many people and organizations have generously provided scenes and models for use in this book and the pbrt distribution. Their generosity has been invaluable in helping us create interesting example images throughout the text.
We are most grateful to Guillermo M. Leal Llaguno of Evolución Visual, www.evvisual.com, who modeled and rendered the iconic San Miguel scene that was featured on the cover of the second edition and is still used in numerous figures in the book. We would also especially like to thank Marko Dabrovic (www.3lhd.com) and Mihovil Odak at RNA Studios (www.rna.hr), who supplied a bounty of models and scenes used in earlier editions of the book, including the Sponza atrium, the Sibenik cathedral, and the Audi TT car model that can be seen in Figure 16.1 of this edition.
We sincerely thank Jan-Walter Schliep, Burak Kahraman, and Timm Dapper of Laubwerk (www.laubwerk.com) for creating the Countryside landscape scene that was on the cover of the previous edition of the book and is used in numerous figures in this edition.
Many thanks to Angelo Ferretti of Lucydreams (www.lucydreams.it) for licensing the Watercolor and Kroken scenes, which have provided a wonderful cover image for this edition, material for numerous figures, and a pair of complex scenes that exercise pbrt’s capabilities.
Jim Price kindly provided a number of scenes featuring interesting volumetric media; those have measurably improved the figures for that topic. Thanks also to Beeple for making the Zero Day and Transparent Machines scenes available under a permissive license and to Martin Lubich for the Austrian Imperial Crown model. Finally, our deepest thanks to Walt Disney Animation Studios for making the production-complexity Moana Island scene available as well as providing the detailed volumetric cloud model.
The bunny, Buddha, and dragon models are courtesy of the Stanford Computer Graphics Laboratory’s scanning repository. The “killeroo” model is included with permission of Phil Dench and Martin Rezard (3D scan and digital representations by headus, design and clay sculpt by Rezard). The dragon model scan used in Chapter 9 is courtesy of Christian Schüller, and our thanks to Yasutoshi Mori for the material orb and the sports car model. The head model used to illustrate subsurface scattering was made available by Infinite Realities, Inc. under a Creative Commons Attribution 3.0 license. Thanks also to “tyrant monkey” for the BMW M6 car model and “Wig42” for the breakfast table scene; both were posted to blendswap.com, also under a Creative Commons Attribution 3.0 license.
We have made use of numerous environment maps from the PolyHaven website (polyhaven.com) for HDR lighting in various scenes; all are available under a Creative Commons CC0 license. Thanks to Sergej Majboroda and Greg Zaal, whose environment maps we have used.
Marc Ellens provided spectral data for a variety of light sources, and the spectral RGB measurement data for a variety of displays is courtesy of Tom Lianza at X-Rite. Our thanks as well to Danny Pascale (www.babelcolor.com) for allowing us to include his measurements of the spectral reflectance of a color chart. Thanks to Mikhail Polyanskiy for index of refraction data via refractiveindex.info and to Anders Langlands, Luca Fascione, and Weta Digital for camera sensor response data that is included in pbrt.
ABOUT THE COVER
The Watercolor scene on the cover was created by Angelo Ferretti of Lucydreams (www.lucydreams.it). It requires a total of 2 GiB of on-disk storage for geometry and 836 MiB for texture maps. Come rendering, the scene description requires 15 GiB of memory to store over 33 million unique triangles, 412 texture maps, and associated data structures.
ADDITIONAL READING
Donald Knuth’s article Literate Programming (Knuth 1984) describes the main ideas behind literate programming as well as his web programming environment. The seminal TEX typesetting system was written with web and has been published as a series of books (Knuth 1986; Knuth 1993a). Knuth and Levy presented the implementation of the cweb literate programming system as a literate program (Knuth and Levy 1994). Knuth has also published both a collection of graph algorithms in The Stanford GraphBase (Knuth 1993b) and a simulator for the MMIX instruction set (Knuth 1999) in literate format. These programs are enjoyable to read and are excellent presentations of their respective algorithms. The website www.literateprogramming.com has pointers to many articles about literate programming, literate programs to download, and a variety of literate programming systems; many refinements have been made since Knuth’s original development of the idea.
Other literate programs we know of that have been published as books include one on the implementation of the lcc compiler, which was written by Christopher Fraser and David Hanson and published as A Retargetable C Compiler: Design and Implementation (Fraser and Hanson 1995). See also Hanson’s book on program interface design (Hanson 1996), Mehlhorn and Näher’s presentation on the implementation of the LEDA library (Mehlhorn and Näher 1999), Valiente’s collection of graph algorithms (Valiente 2002), and Ruckert’s description of the mp3 audio format (Ruckert 2005).
_________________
1 Of course, any computer simulation of physics requires carefully choosing approximations that trade off requirements for fidelity with computational efficiency. See Section 1.2 for further discussion of the choices made in pbrt.
Rendering is the process of producing an image from the description of a 3D scene. Obviously, this is a broad task, and there are many ways to approach it. Physically based techniques attempt to simulate reality; that is, they use principles of physics to model the interaction of light and matter. While a physically based approach may seem to be the most obvious way to approach rendering, it has only been widely adopted in practice over the past 15 or so years.
This book describes pbrt, a physically based rendering system based on the ray-tracing algorithm. It is capable of rendering realistic images of complex scenes such as the one shown in Figure 1.1. (Other than a few exceptions in this chapter that are noted with their appearance, all the images in this book are rendered with pbrt.)
Most computer graphics books present algorithms and theory, sometimes combined with snippets of code. In contrast, this book couples the theory with a complete implementation of a fully functional rendering system. Furthermore, the full source code of the system is available under an open-source license, and the full text of this book is freely available online at pbr-book.org/4ed, as of November 1, 2023. Further information, including example scenes and additional information about pbrt, can be found on the website, pbrt.org.
While creating the TEX typesetting system, Donald Knuth developed a new programming methodology based on a simple but revolutionary idea. To quote Knuth, “let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.” He named this methodology literate programming. This book (including the chapter you are reading now) is a long literate program. This means that in the course of reading this book, you will read the full implementation of the pbrt rendering system, not just a high-level description of it.
Literate programs are written in a metalanguage that mixes a document formatting language (e.g., TEX or HTML) and a programming language (e.g., C++). Two separate systems process the program: a “weaver” that transforms the literate program into a document suitable for typesetting and a “tangler” that produces source code suitable for compilation. Our literate programming system is homegrown, but it was heavily influenced by Norman Ramsey’s noweb system.
Figure 1.1: A Scene Rendered by pbrt. The Kroken scene features complex geometry, materials, and light transport. Handling all of these effects well in a rendering system makes it possible to render photorealistic images like this one. This scene and many others can be downloaded from the pbrt website. (Scene courtesy of Angelo Ferretti.)
The literate programming metalanguage provides two important features. The first is the ability to mix prose with source code. This feature puts the description of the program on equal footing with its actual source code, encouraging careful design and documentation. Second, the language provides mechanisms for presenting the program code to the reader in an order that is entirely different from the compiler input. Thus, the program can be described in a logical manner. Each named block of code is called a fragment, and each fragment can refer to other fragments by name.
As a simple example, consider a function InitGlobals() that is responsible for initializing all of a program’s global variables:1
void InitGlobals() {
nMarbles = 25.7;
shoeSize = 13;
dielectric = true;
}
Despite its brevity, this function is hard to understand without any context. Why, for example, can the variable nMarbles take on floating-point values? Just looking at the code, one would need to search through the entire program to see where each variable is declared and how it is used in order to understand its purpose and the meanings of its legal values. Although this structuring of the system is fine for a compiler, a human reader would much rather see the initialization code for each variable presented separately, near the code that declares and uses the variable.
In a literate program, one can instead write InitGlobals() like this:
〈Function Definitions〉 ≡
void InitGlobals() {
〈Initialize Global Variables 3〉
}
This defines a fragment, called 〈Function Definitions〉, that contains the definition of the InitGlobals() function. The InitGlobals() function itself refers to another fragment, 〈Initialize Global Variables〉. Because the initialization fragment has not yet been defined, we do not know anything about this function except that it will presumably contain assignments to global variables.
Just having the fragment name is just the right level of abstraction for now, since no variables have been declared yet. When we introduce the global variable shoeSize somewhere later in the program, we can then write
〈Initialize Global Variables〉 ≡ shoeSize = 13; |
3 |
Here we have started to define the contents of 〈Initialize Global Variables〉. When the literate program is tangled into source code for compilation, the literate programming system will substitute the code shoeSize = 13; inside the definition of the InitGlobals() function.
Later in the text, we may define another global variable, dielectric, and we can append its initialization to the fragment:
〈Initialize Global Variables〉 +≡ dielectric = true; |
3 |
The +≡ symbol after the fragment name shows that we have added to a previously defined fragment.
When tangled, these three fragments turn into the code
In this way, we can decompose complex functions into logically distinct parts, making them much easier to understand. For example, we can write a complicated function as a series of fragments:
〈Function Definitions〉 +≡
void complexFunc(int x, int y, double *values) {
〈Check validity of arguments〉
if (x < y) {
〈Swap x and y〉
}
〈Do precomputation before loop〉
〈Loop through and update values array〉
}
Again, the contents of each fragment are expanded inline in complexFunc() for compilation. In the document, we can introduce each fragment and its implementation in turn. This decomposition lets us present code a few lines at a time, making it easier to understand. Another advantage of this style of programming is that by separating the function into logical fragments, each with a single and well-delineated purpose, each one can then be written, verified, or read independently. In general, we will try to make each fragment less than 10 lines long.
In some sense, the literate programming system is just an enhanced macro substitution package tuned to the task of rearranging program source code. This may seem like a trivial change, but in fact literate programming is quite different from other ways of structuring software systems.
1.1.1 INDEXING AND CROSS-REFERENCING
The following features are designed to make the text easier to navigate. Indices in the page margins give page numbers where the functions, variables, and methods used on that page are defined. Indices at the end of the book collect all of these identifiers so that it’s possible to find definitions by name. The index of fragments, starting on page 1183, lists the pages where each fragment is defined and where it is used. An index of class names and their members follows, starting on page 1201, and an index of miscellaneous identifiers can be found on page 1213. Within the text, a defined fragment name is followed by a list of page numbers on which that fragment is used. For example, a hypothetical fragment definition such as
〈A fascinating fragment〉 ≡ nMarbles += .001; |
184, 690 |
indicates that this fragment is used on pages 184 and 690. Occasionally we elide fragments from the printed book that are either boilerplate code or substantially the same as other fragments; when these fragments are used, no page numbers will be listed.
When a fragment is used inside another fragment, the page number on which it is first defined appears after the fragment name. For example,
〈Do something interesting〉 +≡ InitializeSomethingInteresting(); 〈Do something else interesting 486〉 CleanUp(); |
500 |
indicates that the 〈Do something else interesting〉 fragment is defined on page 486.
1.2 PHOTOREALISTIC RENDERING AND THE RAY-TRACING ALGORITHM
The goal of photorealistic rendering is to create an image of a 3D scene that is indistinguishable from a photograph of the same scene. Before we describe the rendering process, it is important to understand that in this context the word indistinguishable is imprecise because it involves a human observer, and different observers may perceive the same image differently. Although we will cover a few perceptual issues in this book, accounting for the precise characteristics of a given observer is a difficult and not fully solved problem. For the most part, we will be satisfied with an accurate simulation of the physics of light and its interaction with matter, relying on our understanding of display technology to present the best possible image to the viewer.
Given this single-minded focus on realistic simulation of light, it seems prudent to ask: what is light? Perception through light is central to our very existence, and this simple question has thus occupied the minds of famous philosophers and physicists since the beginning of recorded time. The ancient Indian philosophical school of Vaisheshika (5th–6th century BC) viewed light as a collection of small particles traveling along rays at high velocity. In the fifth century BC, the Greek philosopher Empedocles postulated that a divine fire emerged from human eyes and combined with light rays from the sun to produce vision. Between the 18th and 19th century, polymaths such as Isaac Newton, Thomas Young, and Augustin-Jean Fresnel endorsed conflicting theories modeling light as the consequence of either wave or particle propagation. During the same time period, André-Marie Ampère, Joseph-Louis Lagrange, Carl Friedrich Gauß, and Michael Faraday investigated the relations between electricity and magnetism that culminated in a sudden and dramatic unification by James Clerk Maxwell into a combined theory that is now known as electromagnetism.
Light is a wave-like manifestation in this framework: the motion of electrically charged particles such as electrons in a light bulb’s filament produces a disturbance of a surrounding electric field that propagates away from the source. The electric oscillation also causes a secondary oscillation of the magnetic field, which in turn reinforces an oscillation of the electric field, and so on. The interplay of these two fields leads to a self-propagating wave that can travel extremely large distances: millions of light years, in the case of distant stars visible in a clear night sky. In the early 20th century, work by Max Planck, Max Born, Erwin Schrödinger, and Werner Heisenberg led to another substantial shift of our understanding: at a microscopic level, elementary properties like energy and momentum are quantized, which means that they can only exist as an integer multiple of a base amount that is known as a quantum. In the case of electromagnetic oscillations, this quantum is referred to as a photon. In this sense, our physical understanding has come full circle: once we turn to very small scales, light again betrays a particle-like behavior that coexists with its overall wave-like nature.
How does our goal of simulating light to produce realistic images fit into all of this? Faced with this tower of increasingly advanced explanations, a fundamental question arises: how far must we climb this tower to attain photorealism? To our great fortune, the answer turns out to be “not far at all.” Waves comprising visible light are extremely small, measuring only a few hundred nanometers from crest to trough. The complex wave-like behavior of light appears at these small scales, but it is of little consequence when simulating objects at the scale of, say, centimeters or meters. This is excellent news, because detailed wave-level simulations of anything larger than a few micrometers are impractical: computer graphics would not exist in its current form if this level of detail was necessary to render images. Instead, we will mostly work with equations developed between the 16th and early 19th century that model light as particles that travel along rays. This leads to a more efficient computational approach based on a key operation known as ray tracing.
Ray tracing is conceptually a simple algorithm; it is based on following the path of a ray of light through a scene as it interacts with and bounces off objects in an environment. Although there are many ways to write a ray tracer, all such systems simulate at least the following objects and phenomena:
We will briefly discuss each of these simulation tasks in this section. In the next section, we will show pbrt’s high-level interface to the underlying simulation components and will present a simple rendering algorithm that randomly samples light paths through a scene in order to generate images.
Nearly everyone has used a camera and is familiar with its basic functionality: you indicate your desire to record an image of the world (usually by pressing a button or tapping a screen), and the image is recorded onto a piece of film or by an electronic sensor.2 One of the simplest devices for taking photographs is called the pinhole camera. Pinhole cameras consist of a light-tight box with a tiny hole at one end (Figure 1.2). When the hole is uncovered, light enters and falls on a piece of photographic paper that is affixed to the other end of the box. Despite its simplicity, this kind of camera is still used today, mostly for artistic purposes. Long exposure times are necessary to get enough light on the film to form an image.
Figure 1.2: A Pinhole Camera. The viewing volume is determined by the projection of the film through the pinhole.
Figure 1.3: When we simulate a pinhole camera, we place the film in front of the hole at the imaging plane, and the hole is renamed the eye.
Although most cameras are substantially more complex than the pinhole camera, it is a convenient starting point for simulation. The most important function of the camera is to define the portion of the scene that will be recorded onto the film. In Figure 1.2, we can see how connecting the pinhole to the edges of the film creates a double pyramid that extends into the scene. Objects that are not inside this pyramid cannot be imaged onto the film. Because actual cameras image a more complex shape than a pyramid, we will refer to the region of space that can potentially be imaged onto the film as the viewing volume.
Another way to think about the pinhole camera is to place the film plane in front of the pinhole but at the same distance (Figure 1.3). Note that connecting the hole to the film defines exactly the same viewing volume as before. Of course, this is not a practical way to build a real camera, but for simulation purposes it is a convenient abstraction. When the film (or image) plane is in front of the pinhole, the pinhole is frequently referred to as the eye.
Now we come to the crucial issue in rendering: at each point in the image, what color does the camera record? The answer to this question is partially determined by what part of the scene is visible at that point. If we recall the original pinhole camera, it is clear that only light rays that travel along the vector between the pinhole and a point on the film can contribute to that film location. In our simulated camera with the film plane in front of the eye, we are interested in the amount of light traveling from the image point to the eye.
Therefore, an important task of the camera simulator is to take a point on the image and generate rays along which incident light will contribute to that image location. Because a ray consists of an origin point and a direction vector, this task is particularly simple for the pinhole camera model of Figure 1.3: it uses the pinhole for the origin and the vector from the pinhole to the imaging plane as the ray’s direction. For more complex camera models involving multiple lenses, the calculation of the ray that corresponds to a given point on the image may be more involved.
Light arriving at the camera along a ray will generally carry different amounts of energy at different wavelengths. The human visual system interprets this wavelength variation as color. Most camera sensors record separate measurements for three wavelength distributions that correspond to red, green, and blue colors, which is sufficient to reconstruct a scene’s visual appearance to a human observer. (Section 4.6 discusses color in more detail.) Therefore, cameras in pbrt also include a film abstraction that both stores the image and models the film sensor’s response to incident light.
pbrt’s camera and film abstraction is described in detail in Chapter 5. With the process of converting image locations to rays encapsulated in the camera module and with the film abstraction responsible for determining the sensor’s response to light, the rest of the rendering system can focus on evaluating the lighting along those rays.
1.2.2 RAY–OBJECT INTERSECTIONS
Each time the camera generates a ray, the first task of the renderer is to determine which object, if any, that ray intersects first and where the intersection occurs. This intersection point is the visible point along the ray, and we will want to simulate the interaction of light with the object at this point. To find the intersection, we must test the ray for intersection against all objects in the scene and select the one that the ray intersects first. Given a ray r, we first start by writing it in parametric form:
r(t) = o + td,
where o is the ray’s origin, d is its direction vector, and t is a parameter whose legal range is [0, ∞). We can obtain a point along the ray by specifying its parametric t value and evaluating the above equation.
It is often easy to find the intersection between the ray r and a surface defined by an implicit function F (x, y, z) = 0. We first substitute the ray equation into the implicit equation, producing a new function whose only parameter is t. We then solve this function for t and substitute the smallest positive root into the ray equation to find the desired point. For example, the implicit equation of a sphere centered at the origin with radius r is
x2 + y2 + z2 − r2 = 0.
Substituting the ray equation, we have
(ox + tdx)2 + (oy + tdy)2 + (oz + tdz)2 − r2 = 0,
where subscripts denote the corresponding component of a point or vector. For a given ray and a given sphere, all the values besides t are known, giving us an easily solved quadratic equation in t. If there are no real roots, the ray misses the sphere; if there are roots, the smallest positive one gives the intersection point.
The intersection point alone is not enough information for the rest of the ray tracer; it needs to know certain properties of the surface at the point. First, a representation of the material at the point must be determined and passed along to later stages of the ray-tracing algorithm.
Figure 1.4: Moana Island Scene, Rendered by pbrt. This model from a feature film exhibits the extreme complexity of scenes rendered for movies (Walt Disney Animation Studios 2018). It features over 146 million unique triangles, though the true geometric complexity of the scene is well into the tens of billions of triangles due to extensive use of object instancing. (Scene courtesy of Walt Disney Animation Studios.)
Second, additional geometric information about the intersection point will also be required in order to shade the point. For example, the surface normal n is always required. Although many ray tracers operate with only n, more sophisticated rendering systems like pbrt require even more information, such as various partial derivatives of position and surface normal with respect to the local parameterization of the surface.
Of course, most scenes are made up of multiple objects. The brute-force approach would be to test the ray against each object in turn, choosing the minimum positive t value of all intersections to find the closest intersection. This approach, while correct, is very slow, even for scenes of modest complexity. A better approach is to incorporate an acceleration structure that quickly rejects whole groups of objects during the ray intersection process. This ability to quickly cull irrelevant geometry means that ray tracing frequently runs in O(m log n) time, where m is the number of pixels in the image and n is the number of objects in the scene.3 (Building the acceleration structure itself is necessarily at least O(n) time, however.) Thanks to the effectiveness of acceleration structures, it is possible to render highly complex scenes like the one shown in Figure 1.4 in reasonable amounts of time.
pbrt’s geometric interface and implementations of it for a variety of shapes are described in Chapter 6, and the acceleration interface and implementations are shown in Chapter 7.
The ray–object intersection stage gives us a point to be shaded and some information about the local geometry at that point. Recall that our eventual goal is to find the amount of light leaving this point in the direction of the camera. To do this, we need to know how much light is arriving at this point. This involves both the geometric and radiometric distribution of light in the scene. For very simple light sources (e.g., point lights), the geometric distribution of lighting is a simple matter of knowing the position of the lights. However, point lights do not exist in the real world, and so physically based lighting is often based on area light sources. This means that the light source is associated with a geometric object that emits illumination from its surface. However, we will use point lights in this section to illustrate the components of light distribution; a more rigorous discussion of light measurement and distribution is the topic of Chapters 4 and 12.
Figure 1.5: Geometric construction for determining the power per area arriving at a point p due to a point light source. The distance from the point to the light source is denoted by r.
Figure 1.6: Since the point light radiates light equally in all directions, the same total power is deposited on all spheres centered at the light.
We frequently would like to know the amount of light power being deposited on the differential area surrounding the intersection point p (Figure 1.5). We will assume that the point light source has some power Φ associated with it and that it radiates light equally in all directions. This means that the power per area on a unit sphere surrounding the light is Φ/(4π). (These measurements will be explained and formalized in Section 4.1.)
If we consider two such spheres (Figure 1.6), it is clear that the power per area at a point on the larger sphere must be less than the power at a point on the smaller sphere because the same total power is distributed over a larger area. Specifically, the power per area arriving at a point on a sphere of radius r is proportional to 1/r2.
Furthermore, it can be shown that if the tiny surface patch dA is tilted by an angle θ away from the vector from the surface point to the light, the amount of power deposited on dA is proportional to cos θ. Putting this all together, the differential power per area dE (the differential irradiance) is
Figure 1.7: Scene with Thousands of Light Sources. This scene has far too many lights to consider all of them at each point where the reflected light is computed. Nevertheless, it can be rendered efficiently using stochastic sampling of light sources. (Scene courtesy of Beeple.)
Readers already familiar with basic lighting in computer graphics will notice two familiar laws encoded in this equation: the cosine falloff of light for tilted surfaces mentioned above, and the one-over-r-squared falloff of light with distance.
Scenes with multiple lights are easily handled because illumination is linear: the contribution of each light can be computed separately and summed to obtain the overall contribution. An implication of the linearity of light is that sophisticated algorithms can be applied to randomly sample lighting from only some of the light sources at each shaded point in the scene; this is the topic of Section 12.6. Figure 1.7 shows a scene with thousands of light sources rendered in this way.
The lighting distribution described in the previous section ignores one very important component: shadows. Each light contributes illumination to the point being shaded only if the path from the point to the light’s position is unobstructed (Figure 1.8).
Fortunately, in a ray tracer it is easy to determine if the light is visible from the point being shaded. We simply construct a new ray whose origin is at the surface point and whose direction points toward the light. These special rays are called shadow rays. If we trace this ray through the environment, we can check to see whether any intersections are found between the ray’s origin and the light source by comparing the parametric t value of any intersections found to the parametric t value along the ray of the light source position. If there is no blocking object between the light and the surface, the light’s contribution is included.
1.2.5 LIGHT SCATTERING AT SURFACES
We are now able to compute two pieces of information that are vital for proper shading of a point: its location and the incident lighting. Now we need to determine how the incident lighting is scattered at the surface. Specifically, we are interested in the amount of light energy scattered back along the ray that we originally traced to find the intersection point, since that ray leads to the camera (Figure 1.9).
Figure 1.8: A light source only deposits energy on a surface if the source is not obscured as seen from the receiving point. The light source on the left illuminates the point p, but the light source on the right does not.
Figure 1.9: The Geometry of Surface Scattering. Incident light arriving along direction ωi interacts with the surface at point p and is scattered back toward the camera along direction ωo. The amount of light scattered toward the camera is given by the product of the incident light energy and the BRDF.
Each object in the scene provides a material, which is a description of its appearance properties at each point on the surface. This description is given by the bidirectional reflectance distribution function (BRDF). This function tells us how much energy is reflected from an incoming direction ωi to an outgoing direction ωo. We will write the BRDF at p as fr(p, ωo, ωi). (By convention, directions ω are unit vectors.)
It is easy to generalize the notion of a BRDF to transmitted light (obtaining a BTDF) or to general scattering of light arriving from either side of the surface. A function that describes general scattering is called a bidirectional scattering distribution function (BSDF). pbrt supports a variety of BSDF models; they are described in Chapter 9. More complex yet is the bidirectional scattering surface reflectance distribution function (BSSRDF), which models light that exits a surface at a different point than it enters. This is necessary to reproduce translucent materials such as milk, marble, or skin. The BSSRDF is described in Section 4.3.2. Figure 1.10 shows an image rendered by pbrt based on a model of a human head where scattering from the skin is modeled using a BSSRDF.
Figure 1.10: Head with Scattering Modeled Using a BSSRDF. Accurately modeling subsurface light transport rather than assuming that light exits the surface at the same point it entered greatly improves the realism of the rendered image. (Model courtesy of Infinite Realities, Inc.)
1.2.6 INDIRECT LIGHT TRANSPORT
Turner Whitted’s original paper on ray tracing (1980) emphasized its recursive nature, which was the key that made it possible to include indirect specular reflection and transmission in rendered images. For example, if a ray from the camera hits a shiny object like a mirror, we can reflect the ray about the surface normal at the intersection point and recursively invoke the ray-tracing routine to find the light arriving at the point on the mirror, adding its contribution to the original camera ray. This same technique can be used to trace transmitted rays that intersect transparent objects. Many early ray-tracing examples showcased mirrors and glass balls (Figure 1.11) because these types of effects were difficult to capture with other rendering techniques.
In general, the amount of light that reaches the camera from a point on an object is given by the sum of light emitted by the object (if it is itself a light source) and the amount of reflected light. This idea is formalized by the light transport equation (also often known as the rendering equation), which measures light with respect to radiance, a radiometric unit that will be defined in Section 4.1. It says that the outgoing radiance Lo(p, ωo) from a point p in direction ωo is the emitted radiance at that point in that direction, Le(p, ωo), plus the incident radiance from all directions on the sphere S2 around p scaled by the BSDF f (p, ωo, ωi) and a cosine term:
We will show a more complete derivation of this equation in Sections 4.3.1 and 13.1.1. Solving this integral analytically is not possible except for the simplest of scenes, so we must either make simplifying assumptions or use numerical integration techniques.
Figure 1.11: A Prototypical Early Ray Tracing Scene. Note the use of mirrored and glass objects, which emphasizes the algorithm’s ability to handle these kinds of surfaces. (a) Rendered using Whitted’s original ray-tracing algorithm from 1980, and (b) rendered using stochastic progressive photon mapping (SPPM), a modern advanced light transport algorithm. SPPM is able to accurately simulate the focusing of light that passes through the spheres.
Whitted’s ray-tracing algorithm simplifies this integral by ignoring incoming light from most directions and only evaluating Li(p, ωi) for directions to light sources and for the directions of perfect reflection and refraction. In other words, it turns the integral into a sum over a small number of directions. In Section 1.3.6, we will see that simple random sampling of Equation (1.1) can create realistic images that include both complex lighting and complex surface scattering effects. Throughout the remainder of the book, we will show how using more sophisticated random sampling algorithms greatly improves the efficiency of this general approach.
The discussion so far has assumed that rays are traveling through a vacuum. For example, when describing the distribution of light from a point source, we assumed that the light’s power was distributed equally on the surface of a sphere centered at the light without decreasing along the way. The presence of participating media such as smoke, fog, or dust can invalidate this assumption. These effects are important to simulate: a wide class of interesting phenomena can be described using participating media. Figure 1.12 shows an explosion rendered by pbrt. Less dramatically, almost all outdoor scenes are affected substantially by participating media. For example, Earth’s atmosphere causes objects that are farther away to appear less saturated.
Figure 1.12: Explosion Modeled Using Participating Media. Because pbrt is capable of simulating light emission, scattering, and absorption in detailed models of participating media, it is capable of rendering images like this one. (Scene courtesy of Jim Price.)
There are two ways in which a participating medium can affect the light propagating along a ray. First, the medium can extinguish (or attenuate) light, either by absorbing it or by scattering it in a different direction. We can capture this effect by computing the transmittance Tr between the ray origin and the intersection point. The transmittance tells us how much of the light scattered at the intersection point makes it back to the ray origin.
A participating medium can also add to the light along a ray. This can happen either if the medium emits light (as with a flame) or if the medium scatters light from other directions back along the ray. We can find this quantity by numerically evaluating the volume light transport equation, in the same way we evaluated the light transport equation to find the amount of light reflected from a surface. We will leave the description of participating media and volume rendering until Chapters 11 and 14.
pbrt is structured using standard object-oriented techniques: for each of a number of fundamental types, the system specifies an interface that implementations of that type must fulfill. For example, pbrt requires the implementation of a particular shape that represents geometry in a scene to provide a set of methods including one that returns the shape’s bounding box, and another that tests for intersection with a given ray. In turn, the majority of the system can be implemented purely in terms of those interfaces; for example, the code that checks for occluding objects between a light source and a point being shaded calls the shape intersection methods without needing to consider which particular types of shapes are present in the scene.
There are a total of 14 of these key base types, summarized in Table 1.1. Adding a new implementation of one of these types to the system is straightforward; the implementation must provide the required methods, it must be compiled and linked into the executable, and the scene object creation routines must be modified to create instances of the object as needed as the scene description file is parsed. Section C.4 discusses extending the system in more detail.
Table 1.1: Main Interface Types. Most of pbrt is implemented in terms of 14 key base types, listed here. Implementations of each of these can easily be added to the system to extend its functionality.
Base type |
Source Files |
Section |
Spectrum |
base/spectrum.h, util/spectrum.{h,cpp} |
|
Camera |
base/camera.h, cameras.{h,cpp} |
|
Shape |
base/shape.h, shapes.{h,cpp} |
|
Primitive |
cpu/{primitive,accelerators}.{h,cpp} |
|
Sampler |
base/sampler.h, samplers.{h,cpp} |
|
Filter |
base/filter.h, filters.{h,cpp} |
|
BxDF |
base/bxdf.h, bxdfs.{h,cpp} |
|
Material |
base/material.h, materials.{h,cpp} |
|
FloatTexture |
base/texture.h, textures.{h,cpp} |
|
Medium |
base/medium.h, media.{h,cpp} |
|
Light |
base/light.h, lights.{h,cpp} |
|
LightSampler |
base/lightsampler.h, lightsamplers.{h,cpp} |
|
Integrator |
cpu/integrators.{h,cpp} |
BxDF 538
Camera 206
Filter 515
FloatTexture 656
Integrator 22
Light 740
LightSampler 781
Material 674
Medium 714
Primitive 398
Sampler 469
Shape 261
Spectrum 165
SpectrumTexture 656
Conventional practice in C++ would be to specify the interfaces for each of these types using abstract base classes that define pure virtual functions and to have implementations inherit from those base classes and implement the required virtual functions. In turn, the compiler would take care of generating the code that calls the appropriate method, given a pointer to any object of the base class type. That approach was used in the three previous versions of pbrt, but the addition of support for rendering on graphics processing units (GPUs) in this version motivated a more portable approach based on tag-based dispatch, where each specific type implementation is assigned a unique integer that determines its type at runtime. (See Section 1.5.7 for more information about this topic.) The polymorphic types that are implemented in this way in pbrt are all defined in header files in the base/ directory.
This version of pbrt is capable of running on GPUs that support C++17 and provide APIs for ray intersection tests.4 We have carefully designed the system so that almost all of pbrt’s implementation runs on both CPUs and GPUs, just as it is presented in Chapters 2 through 12. We will therefore generally say little about the CPU versus the GPU in most of the following.
The main differences between the CPU and GPU rendering paths in pbrt are in their data flow and how they are parallelized—effectively, how the pieces are connected together. Both the basic rendering algorithm described later in this chapter and the light transport algorithms described in Chapters 13 and 14 are only available on the CPU. The GPU rendering pipeline is discussed in Chapter 15, though it, too, is also capable of running on the CPU (not as efficiently as the CPU-targeted light transport algorithms, however).
While pbrt can render many scenes well with its current implementation, it has frequently been extended by students, researchers, and developers. Throughout this section are a number of notable images from those efforts. Figures 1.13, 1.14, and 1.15 were each created by students in a rendering course where the final class project was to extend pbrt with new functionality in order to render an image that it could not have rendered before. These images are among the best from that course.
pbrt can be conceptually divided into three phases of execution. First, it parses the scene description file provided by the user. The scene description is a text file that specifies the geometric shapes that make up the scene, their material properties, the lights that illuminate them, where the virtual camera is positioned in the scene, and parameters to all the individual algorithms used throughout the system. The scene file format is documented on the pbrt website, pbrt.org.
The result of the parsing phase is an instance of the BasicScene class, which stores the scene specification, but not in a form yet suitable for rendering. In the second phase of execution, pbrt creates specific objects corresponding to the scene; for example, if a perspective projection has been specified, it is in this phase that a PerspectiveCamera object corresponding to the specified viewing parameters is created. Previous versions of pbrt intermixed these first two phases, but for this version we have separated them because the CPU and GPU rendering paths differ in some of the ways that they represent the scene in memory.
BasicScene 1134
PerspectiveCamera 220
Figure 1.13: Guillaume Poncin and Pramod Sharma extended pbrt in numerous ways, implementing a number of complex rendering algorithms, to make this prize-winning image for Stanford’s CS348b rendering competition. The trees are modeled procedurally with L-systems, a glow image processing filter increases the apparent realism of the lights on the tree, snow was modeled procedurally with metaballs, and a subsurface scattering algorithm gave the snow its realistic appearance by accounting for the effect of light that travels beneath the snow for some distance before leaving it.
In the third phase, the main rendering loop executes. This phase is where pbrt usually spends the majority of its running time, and most of this book is devoted to code that executes during this phase. To orchestrate the rendering, pbrt implements an integrator, so-named because its main task is to evaluate the integral in Equation (1.1).
The main() function for the pbrt executable is defined in the file cmd/pbrt.cpp in the directory that holds the pbrt source code, src/pbrt in the pbrt distribution. It is only a hundred and fifty or so lines of code, much of it devoted to processing command-line arguments and related bookkeeping.
〈main program〉 ≡
int main(int argc, char *argv[]) {
〈Convert command-line arguments to vector of strings 19〉
〈Declare variables for parsed command line 19〉
〈Process command-line arguments〉
〈Initialize pbrt 20〉
〈Parse provided scene description files 20〉
〈Render the scene 21〉
〈Clean up after rendering the scene 21〉
}
Rather than operate on the argv values provided to the main() function directly, pbrt converts the provided arguments to a vector of std::strings. It does so not only for the greater convenience of the string class, but also to support non-ASCII character sets. Section B.3.2 has more information about character encodings and how they are handled in pbrt.
〈Convert command-line arguments to vector of strings〉 ≡ std::vector<std::string> args = GetCommandLineArguments(argv); |
18 |
Figure 1.14: Abe Davis, David Jacobs, and Jongmin Baek rendered this amazing image of an ice cave to take the grand prize in the 2009 Stanford CS348b rendering competition. They first implemented a simulation of the physical process of glaciation, the process where snow falls, melts, and refreezes over the course of many years, forming stratified layers of ice. They then simulated erosion of the ice due to melted water runoff before generating a geometric model of the ice. Scattering of light inside the volume was simulated with volumetric photon mapping; the blue color of the ice is entirely due to modeling the wavelength-dependent absorption of light in the ice volume.
We will only include the definitions of some of the main function’s fragments in the book text here. Some, such as the one that handles parsing command-line arguments provided by the user, are both simple enough and long enough that they are not worth the few pages that they would add to the book’s length. However, we will include the fragment that declares the variables in which the option values are stored.
GetCommandLineArguments() 1063
PBRTOptions 1032
〈Declare variables for parsed command line〉 ≡ PBRTOptions options; std::vector<std::string> filenames; |
18 |
Figure 1.15: Chenlin Meng, Hubert Teo, and Jiren Zhu rendered this tasty-looking image of cotton candy in a teacup to win the grand prize in the 2018 Stanford CS348b rendering competition. They modeled the cotton candy with multiple layers of curves and then filled the center with a participating medium to efficiently model scattering in its interior.
The GetCommandLineArguments() function and PBRTOptions type appear in a mini-index in the page margin, along with the number of the page where they are defined. The mini-indices have pointers to the definitions of almost all the functions, classes, methods, and member variables used or referred to on each page. (In the interests of brevity, we will omit very widely used classes such as Ray from the mini-indices, as well as types or methods that were just introduced in the preceding few pages.)
The PBRTOptions class stores various rendering options that are generally more suited to be specified on the command line rather than in scene description files—for example, how chatty pbrt should be about its progress during rendering. It is passed to the InitPBRT() function, which aggregates the various system-wide initialization tasks that must be performed before any other work is done. For example, it initializes the logging system and launches a group of threads that are used for the parallelization of pbrt.
〈Initialize pbrt〉 ≡ InitPBRT(options); |
18 |
After the arguments have been parsed and validated, the ParseFiles() function takes over to handle the first of the three phases of execution described earlier. With the assistance of two classes, BasicSceneBuilder and BasicScene, which are respectively described in Sections C.2 and C.3, it loops over the provided filenames, parsing each file in turn. If pbrt is run with no filenames provided, it looks for the scene description from standard input. The mechanics of tokenizing and parsing scene description files will not be described in this book, but the parser implementation can be found in the files parser.h and parser.cpp in the src/pbrt directory.
〈Parse provided scene description files〉 ≡ BasicScene scene; BasicSceneBuilder builder(&scene); ParseFiles(&builder, filenames); |
18 |
BasicScene 1134
BasicSceneBuilder 1123
GetCommandLineArguments() 1063
InitPBRT() 1032
ParseFiles() 1120
PBRTOptions 1032
RenderWavefront() 927
After the scene description has been parsed, one of two functions is called to render the scene. RenderWavefront() supports both the CPU and GPU rendering paths, processing a million or so image samples in parallel. It is the topic of Chapter 15. RenderCPU() renders the scene using an Integrator implementation and is only available when running on the CPU. It uses much less parallelism than RenderWavefront(), rendering only as many image samples as there are CPU threads in parallel.
Figure 1.16: Martin Lubich modeled this scene of the Austrian Imperial Crown using Blender; it was originally rendered using LuxRender, which started out as a fork of the pbrt-v1 codebase. The crown consists of approximately 3.5 million triangles that are illuminated by six area light sources with emission spectra based on measured data from a real-world light source. It was originally rendered with 1280 samples per pixel in 73 hours of computation on a quad-core CPU. On a modern GPU, pbrt renders this scene at the same sampling rate in 184 seconds.
Both of these functions start by converting the BasicScene into a form suitable for efficient rendering and then pass control to a processor-specific integrator. (More information about this process is available in Section C.3.) We will for now gloss past the details of this transformation in order to focus on the main rendering loop in RenderCPU(), which is much more interesting. For that, we will take the efficient scene representation as a given.
〈Render the scene〉 ≡ if (Options->useGPU || Options->wavefront) RenderWavefront(scene); else RenderCPU(scene); |
18 |
BasicPBRTOptions::useGPU 1031
BasicPBRTOptions::wavefront 1031
BasicScene 1134
CleanupPBRT() 1032
InitPBRT() 1032
Integrator 22
Options 1032
RenderCPU() 20
RenderWavefront() 927
After the image has been rendered, CleanupPBRT() takes care of shutting the system down gracefully, including, for example, terminating the threads launched by InitPBRT().
〈Clean up after rendering the scene〉 ≡ CleanupPBRT(); |
18 |
In the RenderCPU() rendering path, an instance of a class that implements the Integrator interface is responsible for rendering. Because Integrator implementations only run on the CPU, we will define Integrator as a standard base class with pure virtual methods. Integrator and the various implementations are each defined in the files cpu/integrator.h and cpu/integrator.cpp.
〈Integrator Definition〉 ≡
class Integrator {
public:
〈Integrator Public Methods 23〉
〈Integrator Public Members 22〉
protected:
〈Integrator Protected Methods 22〉
};
The base Integrator constructor takes a single Primitive that represents all the geometric objects in the scene as well as an array that holds all the lights in the scene.
〈Integrator Protected Methods〉 ≡ Integrator(Primitive aggregate, std::vector<Light> lights) : aggregate(aggregate), lights(lights) { 〈Integrator constructor implementation 23〉 } |
22 |
Each geometric object in the scene is represented by a Primitive, which is primarily responsible for combining a Shape that specifies its geometry and a Material that describes its appearance (e.g., the object’s color, or whether it has a dull or glossy finish). In turn, all the geometric primitives in a scene are collected into a single aggregate primitive that is stored in the Integrator::aggregate member variable. This aggregate is a special kind of primitive that itself holds references to many other primitives. The aggregate implementation stores all the scene’s primitives in an acceleration data structure that reduces the number of unnecessary ray intersection tests with primitives that are far away from a given ray. Because it implements the Primitive interface, it appears no different from a single primitive to the rest of the system.
〈Integrator Public Members〉 ≡ Primitive aggregate; std::vector<Light> lights; |
22 |
Each light source in the scene is represented by an object that implements the Light interface, which allows the light to specify its shape and the distribution of energy that it emits. Some lights need to know the bounding box of the entire scene, which is unavailable when they are first created. Therefore, the Integrator constructor calls their Preprocess() methods, providing those bounds. At this point any “infinite” lights are also stored in a separate array. This sort of light, which will be introduced in Section 12.5, models infinitely far away sources of light, which is a reasonable model for skylight as received on Earth’s surface, for example. Sometimes it will be necessary to loop over just those lights, and for scenes with thousands of light sources it would be inefficient to loop over all of them just to find those.
Integrator 22
Integrator::aggregate 22
Light 740
Material 674
Primitive 398
RenderCPU() 20
Shape 261
〈Integrator constructor implementation〉 ≡ Bounds3f sceneBounds = aggregate ? aggregate.Bounds() : Bounds3f(); for (auto &light : lights) { light.Preprocess(sceneBounds); if (light.Type() == LightType::Infinite) infiniteLights.push_back(light); } |
22 |
〈Integrator Public Members〉 +≡ std::vector<Light> infiniteLights; |
22 |
Integrators must provide an implementation of the Render() method, which takes no further arguments. This method is called by the RenderCPU() function once the scene representation has been initialized. The task of integrators is to render the scene as specified by the aggregate and the lights. Beyond that, it is up to the specific integrator to define what it means to render the scene, using whichever other classes that it needs to do so (e.g., a camera model). This interface is intentionally very general to permit a wide range of implementations—for example, one could implement an Integrator that measures light only at a sparse set of points distributed through the scene rather than generating a regular 2D image.
〈Integrator Public Methods〉 ≡ virtual void Render() = 0; |
22 |
The Integrator class provides two methods related to ray–primitive intersection for use of its subclasses. Intersect() takes a ray and a maximum parametric distance tMax, traces the given ray into the scene, and returns a ShapeIntersection object corresponding to the closest primitive that the ray hit, if there is an intersection along the ray before tMax. (The ShapeIntersection structure is defined in Section 6.1.3.) One thing to note is that this method uses the type pstd::optional for the return value rather than std::optional from the C++ standard library; we have reimplemented parts of the standard library in the pstd namespace for reasons that are discussed in Section 1.5.5.
〈Integrator Method Definitions〉 ≡
pstd::optional<ShapeIntersection>
Integrator::Intersect(const Ray &ray, Float tMax) const {
if (aggregate) return aggregate.Intersect(ray, tMax);
else return {};
}
Bounds3f 97
Float 23
Integrator 22
Integrator::aggregate 22
Integrator::infiniteLights 23
Integrator::IntersectP() 24
Integrator::lights 22
Light 740
Light::Preprocess() 743
Light::Type() 740
LightType 740
LightType::Infinite 740
Primitive::Bounds() 398
Primitive::Intersect() 398
Ray 95
RenderCPU() 20
ShapeIntersection 266
Also note the capitalized floating-point type Float in Intersect()’s signature: almost all floating-point values in pbrt are declared as Floats. (The only exceptions are a few cases where a 32-bit float or a 64-bit double is specifically needed (e.g., when saving binary values to files).) Depending on the compilation flags of pbrt, Float is an alias for either float or double, though single precision float is almost always sufficient in practice. The definition of Float is in the pbrt.h header file, which is included by all other source files in pbrt.
〈Float Type Definitions〉 ≡
#ifdef PBRT_FLOAT_AS_DOUBLE
using Float = double;
#else
using Float = float;
#endif
Integrator::IntersectP() is closely related to the Intersect() method. It checks for the existence of intersections along the ray but only returns a Boolean indicating whether an intersection was found. (The “P” in its name indicates that it is a function that evaluates a predicate, using a common naming convention from the Lisp programming language.) Because it does not need to search for the closest intersection or return additional geometric information about intersections, IntersectP() is generally more efficient than Integrator::Intersect(). This routine is used for shadow rays.
〈Integrator Method Definitions〉 +≡
bool Integrator::IntersectP(const Ray &ray, Float tMax) const {
if (aggregate) return aggregate.IntersectP(ray, tMax);
else return false;
}
1.3.4 ImageTileIntegrator AND THE MAIN RENDERING LOOP
Before implementing a basic integrator that simulates light transport to render an image, we will define two Integrator subclasses that provide additional common functionality used by that integrator as well as many of the integrator implementations to come. We start with ImageTileIntegrator, which inherits from Integrator. The next section defines RayIntegrator, which inherits from ImageTileIntegrator.
All of pbrt’s CPU-based integrators render images using a camera model to define the viewing parameters, and all parallelize rendering by splitting the image into tiles and having different processors work on different tiles. Therefore, pbrt includes an ImageTileIntegrator that provides common functionality for those tasks.
〈ImageTileIntegrator Definition〉 ≡
class ImageTileIntegrator : public Integrator {
public:
〈ImageTileIntegrator Public Methods 24〉
protected:
〈ImageTileIntegrator Protected Members 25〉
};
In addition to the aggregate and the lights, the ImageTileIntegrator constructor takes a Camera that specifies the viewing and lens parameters such as position, orientation, focus, and field of view. Film stored by the camera handles image storage. The Camera classes are the subject of most of Chapter 5, and Film is described in Section 5.4. The Film is responsible for writing the final image to a file.
Camera 206
Film 244
Float 23
ImageTileIntegrator 24
ImageTileIntegrator::camera 25
ImageTileIntegrator:: samplerPrototype 25
Integrator 22
Integrator::aggregate 22
Integrator::Intersect() 23
Light 740
Primitive 398
Primitive::IntersectP() 398
Ray 95
RayIntegrator 28
Sampler 469
The constructor also takes a Sampler; its role is more subtle, but its implementation can substantially affect the quality of the images that the system generates. First, the sampler is responsible for choosing the points on the image plane that determine which rays are initially traced into the scene. Second, it is responsible for supplying random sample values that are used by integrators for estimating the value of the light transport integral, Equation (1.1). For example, some integrators need to choose random points on light sources to compute illumination from area lights. Generating a good distribution of these samples is an important part of the rendering process that can substantially affect overall efficiency; this topic is the main focus of Chapter 8.
〈ImageTileIntegrator Public Methods〉 ≡ ImageTileIntegrator(Camera camera, Sampler sampler, Primitive aggregate, std::vector<Light> lights) : Integrator(aggregate, lights), camera(camera), samplerPrototype(sampler) {} |
24 |
〈ImageTileIntegrator Protected Members〉 ≡ Camera camera; Sampler samplerPrototype; |
24 |
For all of pbrt’s integrators, the final color computed at each pixel is based on random sampling algorithms. If each pixel’s final value is computed as the average of multiple samples, then the quality of the image improves. At low numbers of samples, sampling error manifests itself as grainy high-frequency noise in images, though error goes down at a predictable rate as the number of samples increases. (This topic is discussed in more depth in Section 2.1.4.) ImageTileIntegrator::Render() therefore renders the image in waves of a few samples per pixel. For the first two waves, only a single sample is taken in each pixel. In the next wave, two samples are taken, with the number of samples doubling after each wave up to a limit. While it makes no difference to the final image if the image was rendered in waves or with all the samples being taken in a pixel before moving on to the next one, this organization of the computation means that it is possible to see previews of the final image during rendering where all pixels have some samples, rather than a few pixels having many samples and the rest having none.
Because pbrt is parallelized to run using multiple threads, there is a balance to be struck with this approach. There is a cost for threads to acquire work for a new image tile, and some threads end up idle at the end of each wave once there is no more work for them to do but other threads are still working on the tiles they have been assigned. These considerations motivated the capped doubling approach.
〈ImageTileIntegrator Method Definitions〉 ≡
void ImageTileIntegrator::Render() {
〈Declare common variables for rendering image in tiles 25〉
〈Render image in waves 26〉
}
Before rendering begins, a few additional variables are required. First, the integrator implementations will need to allocate small amounts of temporary memory to store surface scattering properties in the course of computing each ray’s contribution. The large number of resulting allocations could easily overwhelm the system’s regular memory allocation routines (e.g., new), which must coordinate multi-threaded maintenance of elaborate data structures to track free memory. A naive implementation could potentially spend a fairly large fraction of its computation time in the memory allocator.
To address this issue, pbrt provides a ScratchBuffer class that manages a small preallocated buffer of memory. ScratchBuffer allocations are very efficient, just requiring the increment of an offset. The ScratchBuffer does not allow independently freeing allocations; instead, all must be freed at once, but doing so only requires resetting that offset.
Because ScratchBuffers are not safe for use by multiple threads at the same time, an individual one is created for each thread using the ThreadLocal template class. Its constructor takes a lambda function that returns a fresh instance of the object of the type it manages; here, calling the default ScratchBuffer constructor is sufficient. ThreadLocal then handles the details of maintaining distinct copies of the object for each thread, allocating them on demand.
〈Declare common variables for rendering image in tiles〉 ≡ ThreadLocal<ScratchBuffer> scratchBuffers( []() { return ScratchBuffer(); } ); |
25 |
Camera 206
Sampler 469
ScratchBuffer 1078
ThreadLocal 1112
Most Sampler implementations find it useful to maintain some state, such as the coordinates of the current pixel. This means that multiple threads cannot use a single Sampler concurrently and ThreadLocal is also used for Sampler management. Samplers provide a Clone() method that creates a new instance of their sampler type. The Sampler first provided to the ImageTileIntegrator constructor, samplerPrototype, provides those copies here.
〈Declare common variables for rendering image in tiles〉 +≡ ThreadLocal<Sampler> samplers( [this]() { return samplerPrototype.Clone(); }); |
25 |
It is helpful to provide the user with an indication of how much of the rendering work is done and an estimate of how much longer it will take. This task is handled by the ProgressReporter class, which takes as its first parameter the total number of items of work. Here, the total amount of work is the number of samples taken in each pixel times the total number of pixels. It is important to use 64-bit precision to compute this value, since a 32-bit int may be insufficient for high-resolution images with many samples per pixel.
〈Declare common variables for rendering image in tiles〉 +≡ Bounds2i pixelBounds = camera.GetFilm().PixelBounds(); int spp = samplerPrototype.SamplesPerPixel(); ProgressReporter progress(int64_t(spp) * pixelBounds.Area(), “Rendering”, Options->quiet); |
25 |
In the following, the range of samples to be taken in the current wave is given by waveStart and waveEnd; nextWaveSize gives the number of samples to be taken in the next wave.
〈Declare common variables for rendering image in tiles〉 +≡ int waveStart = 0, waveEnd = 1, nextWaveSize = 1; |
25 |
With these variables in hand, rendering proceeds until the required number of samples have been taken in all pixels.
〈Render image in waves〉 ≡ while (waveStart < spp) { 〈Render current wave’s image tiles in parallel 27〉 〈Update start and end wave 28〉 〈Optionally write current image to disk〉 } |
25 |
BasicPBRTOptions::quiet 1031
Bounds2::Area() 102
Bounds2i 97
Camera::GetFilm() 207
Film::PixelBounds() 246
ImageTileIntegrator 24
ImageTileIntegrator::camera 25
ImageTileIntegrator:: samplerPrototype 25
Options 1032
ParallelFor2D() 1108
ProgressReporter 1068
Sampler 469
Sampler::Clone() 470
Sampler::SamplesPerPixel() 469
ThreadLocal 1112
The ParallelFor2D() function loops over image tiles, running multiple loop iterations concurrently; it is part of the parallelism-related utility functions that are introduced in Section B.6. A C++ lambda expression provides the loop body. ParallelFor2D() automatically chooses a tile size to balance two concerns: on one hand, we would like to have significantly more tiles than there are processors in the system. It is likely that some of the tiles will take less processing time than others, so if there was for example a 1:1 mapping between processors and tiles, then some processors will be idle after finishing their work while others continue to work on their region of the image. (Figure 1.17 graphs the distribution of time taken to render tiles of an example image, illustrating this concern.) On the other hand, having too many tiles also hurts efficiency. There is a small fixed overhead for a thread to acquire more work in the parallel for loop and the more tiles there are, the more times this overhead must be paid. ParallelFor2D() therefore chooses a tile size that accounts for both the extent of the region to be processed and the number of processors in the system.
〈Render current wave’s image tiles in parallel〉 ≡ ParallelFor2D(pixelBounds, [&](Bounds2i tileBounds) { 〈Render image tile given by tileBounds 27〉 }); |
26 |
Figure 1.17: Histogram of Time Spent Rendering Each Tile for the Scene in Figure 1.11. The horizontal axis measures time in seconds. Note the wide variation in execution time, illustrating that different parts of the image required substantially different amounts of computation.
Given a tile to render, the implementation starts by acquiring the ScratchBuffer and Sampler for the currently executing thread. As described earlier, the ThreadLocal::Get() method takes care of the details of allocating and returning individual ones of them for each thread.
With those in hand, the implementation loops over all the pixels in the tile using a range-based for loop that uses iterators provided by the Bounds2 class before informing the ProgressReporter about how much work has been completed.
〈Render image tile given by tileBounds〉 ≡ ScratchBuffer &scratchBuffer = scratchBuffers.Get(); Sampler &sampler = samplers.Get(); for (Point2i pPixel : tileBounds) { 〈Render samples in pixel pPixel 28〉 } progress.Update((waveEnd - waveStart) * tileBounds.Area()); |
27 |
Bounds2 97
Bounds2::Area() 102
Bounds2i 97
ParallelFor2D() 1108
Point2i 92
ProgressReporter 1068
ProgressReporter::Update() 1068
Sampler 469
ScratchBuffer 1078
ThreadLocal::Get() 1112
Given a pixel to take one or more samples in, the thread’s Sampler is notified that it should start generating samples for the current pixel via StartPixelSample(), which allows it to set up any internal state that depends on which pixel is currently being processed. The integrator’s EvaluatePixelSample() method is then responsible for determining the specified sample’s value, after which any temporary memory it may have allocated in the ScratchBuffer is freed with a call to ScratchBuffer::Reset().
〈Render samples in pixel pPixel〉 ≡ for (int sampleIndex = waveStart; sampleIndex < waveEnd; ++sampleIndex) { sampler.StartPixelSample(pPixel, sampleIndex); EvaluatePixelSample(pPixel, sampleIndex, sampler, scratchBuffer); scratchBuffer.Reset(); } |
27 |
Having provided an implementation of the pure virtual Integrator::Render() method, ImageTileIntegrator now imposes the requirement on its subclasses that they implement the following EvaluatePixelSample() method.
〈ImageTileIntegrator Public Methods〉 +≡ virtual void EvaluatePixelSample(Point2i pPixel, int sampleIndex, Sampler sampler, ScratchBuffer &scratchBuffer) = 0; |
24 |
After the parallel for loop for the current wave completes, the range of sample indices to be processed in the next wave is computed.
〈Update start and end wave〉 ≡ waveStart = waveEnd; waveEnd = std::min(spp, waveEnd + nextWaveSize); nextWaveSize = std::min(2 * nextWaveSize, 64); |
26 |
If the user has provided the --write-partial-images command-line option, the in-progress image is written to disk before the next wave of samples is processed. We will not include here the fragment that takes care of this, 〈Optionally write current image to disk〉.
1.3.5 RayIntegrator IMPLEMENTATION
Just as the ImageTileIntegrator centralizes functionality related to integrators that decompose the image into tiles, RayIntegrator provides commonly used functionality to integrators that trace ray paths starting from the camera. All of the integrators implemented in Chapters 13 and 14 inherit from RayIntegrator.
〈RayIntegrator Definition〉 ≡
class RayIntegrator : public ImageTileIntegrator {
public:
〈RayIntegrator Public Methods 28〉
};
Camera 206
Film 244
ImageTileIntegrator 24
ImageTileIntegrator:: EvaluatePixelSample() 28
Integrator::Render() 23
Light 740
Point2i 92
Primitive 398
RayIntegrator 28
Sampler 469
Sampler::StartPixelSample() 469
ScratchBuffer 1078
ScratchBuffer::Reset() 1079
Its constructor does nothing more than pass along the provided objects to the ImageTile Integrator constructor.
〈RayIntegrator Public Methods〉 ≡ RayIntegrator(Camera camera, Sampler sampler, Primitive aggregate, std::vector<Light> lights) : ImageTileIntegrator(camera, sampler, aggregate, lights) {} |
28 |
RayIntegrator implements the pure virtual EvaluatePixelSample() method from ImageTile Integrator. At the given pixel, it uses its Camera and Sampler to generate a ray into the scene and then calls the Li() method, which is provided by the subclass, to determine the amount of light arriving at the image plane along that ray. As we will see in following chapters, the units of the value returned by this method are related to the incident spectral radiance at the ray origin, which is generally denoted by the symbol Li in equations—thus, the method name. This value is passed to the Film, which records the ray’s contribution to the image.
Figure 1.18: Class Relationships for RayIntegrator::EvaluatePixelSample()’s computation. The Sampler provides sample values for each image sample to be taken. The Camera turns a sample into a corresponding ray from the film plane, and the Li() method computes the radiance along that ray arriving at the film. The sample and its radiance are passed to the Film, which stores their contribution in an image.
Figure 1.18 summarizes the main classes used in this method and the flow of data among them.
〈RayIntegrator Method Definitions〉 ≡
void RayIntegrator::EvaluatePixelSample(Point2i pPixel, int sampleIndex,
Sampler sampler, ScratchBuffer &scratchBuffer) {
〈Sample wavelengths for the ray 29〉
〈Initialize CameraSample for current sample 30〉
〈Generate camera ray for current sample 30〉
〈Trace cameraRay if valid 30〉
〈Add camera ray’s contribution to image 31〉
}
Each ray carries radiance at a number of discrete wavelengths λ (four, by default). When computing the color at each pixel, pbrt chooses different wavelengths at different pixel samples so that the final result better reflects the correct result over all wavelengths. To choose these wavelengths, a sample value lu is first provided by the Sampler. This value will be uniformly distributed and in the range [0, 1). The Film::SampleWavelengths() method then maps this sample to a set of specific wavelengths, taking into account its model of film sensor response as a function of wavelength. Most Sampler implementations ensure that if multiple samples are taken in a pixel, those samples are in the aggregate well distributed over [0, 1). In turn, they ensure that the sampled wavelengths are also well distributed across the range of valid wavelengths, improving image quality.
〈Sample wavelengths for the ray〉 ≡ Float lu = sampler.Get1D(); SampledWavelengths lambda = camera.GetFilm().SampleWavelengths(lu); |
29 |
Camera 206
Camera::GetFilm() 207
CameraSample 206
Film 244
Film::SampleWavelengths() 246
Float 23
GetCameraSample() 516
ImageTileIntegrator::camera 25
Point2i 92
SampledWavelengths 173
Sampler 469
Sampler::Get1D() 470
ScratchBuffer 1078
The CameraSample structure records the position on the film for which the camera should generate a ray. This position is affected by both a sample position provided by the sampler and the reconstruction filter that is used to filter multiple sample values into a single value for the pixel. GetCameraSample() handles those calculations. CameraSample also stores a time that is associated with the ray as well as a lens position sample, which are used when rendering scenes with moving objects and for camera models that simulate non-pinhole apertures, respectively.
〈Initialize CameraSample for current sample〉 ≡ Filter filter = camera.GetFilm().GetFilter(); CameraSample cameraSample = GetCameraSample(sampler, pPixel, filter); |
29 |
The Camera interface provides two methods to generate rays: GenerateRay(), which returns the ray for a given image sample position, and GenerateRayDifferential(), which returns a ray differential, which incorporates information about the rays that the camera would generate for samples that are one pixel away on the image plane in both the x and y directions. Ray differentials are used to get better results from some of the texture functions defined in Chapter 10, by making it possible to compute how quickly a texture varies with respect to the pixel spacing, which is a key component of texture antialiasing.
Some CameraSample values may not correspond to valid rays for a given camera. Therefore, pstd::optional is used for the CameraRayDifferential returned by the camera.
〈Generate camera ray for current sample〉 ≡ pstd::optional<CameraRayDifferential> cameraRay = camera.GenerateRayDifferential(cameraSample, lambda); |
29 |
If the camera ray is valid, it is passed along to the RayIntegrator subclass’s Li() method implementation after some additional preparation. In addition to returning the radiance along the ray L, the subclass is also responsible for initializing an instance of the VisibleSurface class, which records geometric information about the surface the ray intersects (if any) at each pixel for the use of Film implementations like the GBufferFilm that store more information than just color at each pixel.
〈Trace cameraRay if valid〉 ≡ SampledSpectrum L(0.); VisibleSurface visibleSurface; if (cameraRay) { 〈Scale camera ray differentials based on image sampling rate 30〉 〈Evaluate radiance along camera ray 31〉 〈Issue warning if unexpected radiance value is returned〉 } |
29 |
Camera 206
Camera:: GenerateRayDifferential() 207
Camera::GetFilm() 207
CameraRayDifferential 207
CameraSample 206
Film::GetFilter() 246
Filter 515
Float 23
GBufferFilm 253
GetCameraSample() 516
ImageTileIntegrator::camera 25
RayDifferential:: ScaleDifferentials() 97
RayIntegrator 28
SampledSpectrum 171
Sampler::SamplesPerPixel() 469
VisibleSurface 245
Before the ray is passed to the Li() method, the ScaleDifferentials() method scales the differential rays to account for the actual spacing between samples on the film plane when multiple samples are taken per pixel.
〈Scale camera ray differentials based on image sampling rate〉 ≡ Float rayDiffScale = std::max<Float>(.125f, 1 / std::sqrt((Float)sampler.SamplesPerPixel())); cameraRay->ray.ScaleDifferentials(rayDiffScale); |
30 |
For Film implementations that do not store geometric information at each pixel, it is worth saving the work of populating the VisibleSurface class. Therefore, a pointer to this class is only passed in the call to the Li() method if it is necessary, and a null pointer is passed otherwise. Integrator implementations then should only initialize the VisibleSurface if it is non-null.
CameraRayDifferential also carries a weight associated with the ray that is used to scale the returned radiance value. For simple camera models, each ray is weighted equally, but camera models that more accurately simulate the process of image formation by lens systems may generate some rays that contribute more than others. Such a camera model might simulate the effect of less light arriving at the edges of the film plane than at the center, an effect called vignetting.
〈Evaluate radiance along camera ray〉 ≡ bool initializeVisibleSurface = camera.GetFilm().UsesVisibleSurface(); L = cameraRay->weight * Li(cameraRay->ray, lambda, sampler, scratchBuffer, initializeVisibleSurface ? &visibleSurface : nullptr); |
30 |
Li() is a pure virtual method that RayIntegrator subclasses must implement. It returns the incident radiance at the origin of a given ray, sampled at the specified wavelengths.
〈RayIntegrator Public Methods〉 +≡ virtual SampledSpectrum Li( RayDifferential ray, SampledWavelengths &lambda, Sampler sampler, ScratchBuffer &scratchBuffer, VisibleSurface *visibleSurface) const = 0; |
28 |
A common side effect of bugs in the rendering process is that impossible radiance values are computed. For example, division by zero results in radiance values equal to either the IEEE floating-point infinity or a “not a number” value. The renderer looks for these possibilities and prints an error message when it encounters them. Here we will not include the fragment that does this, 〈Issue warning if unexpected radiance value is returned〉. See the implementation in cpu/integrator.cpp if you are interested in its details.
After the radiance arriving at the ray’s origin is known, a call to Film::AddSample() updates the corresponding pixel in the image, given the weighted radiance for the sample. The details of how sample values are recorded in the film are explained in Sections 5.4 and 8.8.
〈Add camera ray’s contribution to image〉 ≡ camera.GetFilm().AddSample(pPixel, L, lambda, &visibleSurface, cameraSample.filterWeight); |
29 |
Although it has taken a few pages to go through the implementation of the integrator infrastructure that culminated in RayIntegrator, we can now turn to implementing light transport integration algorithms in a simpler context than having to start implementing a complete Integrator::Render() method. The RandomWalkIntegrator that we will describe in this section inherits from RayIntegrator and thus all the details of multi-threading, generating the initial ray from the camera and then adding the radiance along that ray to the image, are all taken care of. The integrator operates in a simpler context: a ray has been provided and its task is to compute the radiance arriving at its origin.
Recall that in Section 1.2.7 we mentioned that in the absence of participating media, the light carried by a ray is unchanged as it passes through free space. We will ignore the possibility of participating media in the implementation of this integrator, which allows us to take a first step: given the first intersection of a ray with the geometry in the scene, the radiance arriving at the ray’s origin is equal to the radiance leaving the intersection point toward the ray’s origin. That outgoing radiance is given by the light transport equation (1.1), though it is hopeless to evaluate it in closed form. Numerical approaches are required, and the ones used in pbrt are based on Monte Carlo integration, which makes it possible to estimate the values of integrals based on pointwise evaluation of their integrands. Chapter 2 provides an introduction to Monte Carlo integration, and additional Monte Carlo techniques will be introduced as they are used throughout the book.
Camera::GetFilm() 207
CameraRayDifferential::ray 207
CameraRayDifferential::weight 207
CameraSample::filterWeight 206
Film::AddSample() 244
Film::UsesVisibleSurface() 245
ImageTileIntegrator::camera 25
Integrator::Render() 23
RayDifferential 96
RayIntegrator 28
RayIntegrator::Li() 31
SampledSpectrum 171
SampledWavelengths 173
Sampler 469
ScratchBuffer 1078
VisibleSurface 245
Figure 1.19: A View of the Watercolor Scene, Rendered with the RandomWalkIntegrator. Because the RandomWalkIntegrator does not handle perfectly specular surfaces, the two glasses on the table are black. Furthermore, even with the 8,192 samples per pixel used to render this image, the result is still peppered with high-frequency noise. (Note, for example, the far wall and the base of the chair.) (Scene courtesy of Angelo Ferretti.)
In order to compute the outgoing radiance, the RandomWalkIntegrator implements a simple Monte Carlo approach that is based on incrementally constructing a random walk, where a series of points on scene surfaces are randomly chosen in succession to construct light-carrying paths starting from the camera. This approach effectively models image formation in the real world in reverse, starting from the camera rather than from the light sources. Going backward in this respect is still physically valid because the physical models of light that pbrt is based on are time-reversible.
RandomWalkIntegrator 33
Although the implementation of the random walk sampling algorithm is in total just over twenty lines of code, it is capable of simulating complex lighting and shading effects; Figure 1.19 shows an image rendered using it. (That image required many hours of computation to achieve that level of quality, however.) For the remainder of this section, we will gloss over a few of the mathematical details of the integrator’s implementation and focus on an intuitive understanding of the approach, though subsequent chapters will fill in the gaps and explain this and more sophisticated techniques more rigorously.
〈RandomWalkIntegrator Definition〉 ≡
class RandomWalkIntegrator : public RayIntegrator {
public:
〈RandomWalkIntegrator Public Methods 33〉
private:
〈RandomWalkIntegrator Private Methods 33〉
〈RandomWalkIntegrator Private Members 34〉
};
This integrator recursively evaluates the random walk. Therefore, its Li() method implementation does little more than start the recursion, via a call to the LiRandomWalk() method. Most of the parameters to Li() are just passed along, though the VisibleSurface is ignored for this simple integrator and an additional parameter is added to track the depth of recursion.
〈RandomWalkIntegrator Public Methods〉 ≡ SampledSpectrum Li(RayDifferential ray, SampledWavelengths &lambda, Sampler sampler, ScratchBuffer &scratchBuffer, VisibleSurface *visibleSurface) const { return LiRandomWalk(ray, lambda, sampler, scratchBuffer, 0); } |
33 |
〈RandomWalkIntegrator Private Methods〉 ≡ SampledSpectrum LiRandomWalk(RayDifferential ray, SampledWavelengths &lambda, Sampler sampler, ScratchBuffer &scratchBuffer, int depth) const { 〈Intersect ray with scene and return if no intersection 33〉 〈Get emitted radiance at surface intersection 34〉 〈Terminate random walk if maximum depth has been reached 35〉 〈Compute BSDF at random walk intersection point 35〉 〈Randomly sample direction leaving surface for random walk 35〉 〈Evaluate BSDF at surface for sampled direction 35〉 〈Recursively trace ray to estimate incident radiance at surface 35〉 } |
33 |
Integrator::Intersect() 23
RandomWalkIntegrator:: LiRandomWalk() 33
RayDifferential 96
RayIntegrator 28
SampledSpectrum 171
SampledWavelengths 173
Sampler 469
ScratchBuffer 1078
ShapeIntersection 266
ShapeIntersection::intr 266
SurfaceInteraction 138
VisibleSurface 245
The first step is to find the closest intersection of the ray with the shapes in the scene. If no intersection is found, the ray has left the scene. Otherwise, a SurfaceInteraction that is returned as part of the ShapeIntersection structure provides information about the local geometric properties of the intersection point.
〈Intersect ray with scene and return if no intersection〉 ≡ pstd::optional<ShapeIntersection> si = Intersect(ray); if (!si) { 〈Return emitted light from infinite light sources 34〉 } SurfaceInteraction &isect = si->intr; |
33 |
If no intersection was found, radiance still may be carried along the ray due to light sources such as the ImageInfiniteLight that do not have geometry associated with them. The Light::Le() method allows such lights to return their radiance for a given ray.
〈Return emitted light from infinite light sources〉 ≡ SampledSpectrum Le(0.f); for (Light light : infiniteLights) Le += light.Le(ray, lambda); return Le; |
33 |
If a valid intersection has been found, we must evaluate the light transport equation at the intersection point. The first term, Le(p, ωo), which is the emitted radiance, is easy: emission is part of the scene specification and the emitted radiance is available by calling the SurfaceInteraction::Le() method, which takes the outgoing direction of interest. Here, we are interested in radiance emitted back along the ray’s direction. If the object is not emissive, that method returns a zero-valued spectral distribution.
〈Get emitted radiance at surface intersection〉 ≡ Vector3f wo = -ray.d; SampledSpectrum Le = isect.Le(wo, lambda); |
33 |
Evaluating the second term of the light transport equation requires computing an integral over the sphere of directions around the intersection point p. Application of the principles of Monte Carlo integration can be used to show that if directions ω′ are chosen with equal probability over all possible directions, then an estimate of the integral can be computed as a weighted product of the BSDF f, which describes the light scattering properties of the material at p, the incident lighting, Li, and a cosine factor:
In other words, given a random direction ω′, estimating the value of the integral requires evaluating the terms in the integrand for that direction and then scaling by a factor of 4π. (This factor, which is derived in Section A.5.2, relates to the surface area of a unit sphere.) Since only a single direction is considered, there is almost always error in the Monte Carlo estimate compared to the true value of the integral. However, it can be shown that estimates like this one are correct in expectation: informally, that they give the correct result on average. Averaging multiple independent estimates generally reduces this error—hence, the practice of taking multiple samples per pixel.
The BSDF and the cosine factor of the estimate are easily evaluated, leaving us with Li, the incident radiance, unknown. However, note that we have found ourselves right back where we started with the initial call to LiRandomWalk(): we have a ray for which we would like to find the incident radiance at the origin—that, a recursive call to LiRandomWalk() will provide.
ImageInfiniteLight 767
Integrator::infiniteLights 23
Light 740
Light::Le() 743
Ray::d 95
SampledSpectrum 171
SurfaceInteraction::Le() 762
Vector3f 86
Before computing the estimate of the integral, we must consider terminating the recursion. The RandomWalkIntegrator stops at a predetermined maximum depth, maxDepth. Without this termination criterion, the algorithm might never terminate (imagine, e.g., a hall-of-mirrors scene). This member variable is initialized in the constructor based on a parameter that can be set in the scene description file.
〈RandomWalkIntegrator Private Members〉 ≡ int maxDepth; |
33 |
〈Terminate random walk if maximum depth has been reached〉 ≡ if (depth == maxDepth) return Le; |
33 |
If the random walk is not terminated, the SurfaceInteraction::GetBSDF() method is called to find the BSDF at the intersection point. It evaluates texture functions to determine surface properties and then initializes a representation of the BSDF. It generally needs to allocate memory for the objects that constitute the BSDF’s representation; because this memory only needs to be active when processing the current ray, the ScratchBuffer is provided to it to use for its allocations.
〈Compute BSDF at random walk intersection point〉 ≡ BSDF bsdf = isect.GetBSDF(ray, lambda, camera, scratchBuffer, sampler); |
33 |
Next, we need to sample a random direction ω′ to compute the estimate in Equation (1.2). The SampleUniformSphere() function returns a uniformly distributed direction on the unit sphere, given two uniform values in [0, 1) that are provided here by the sampler.
〈Randomly sample direction leaving surface for random walk〉 ≡ Point2f u = sampler.Get2D(); Vector3f wp = SampleUniformSphere(u); |
33 |
All the factors of the Monte Carlo estimate other than the incident radiance can now be readily evaluated. The BSDF class provides an f() method that evaluates the BSDF for a pair of specified directions, and the cosine of the angle with the surface normal can be computed using the AbsDot() function, which returns the absolute value of the dot product between two vectors. If the vectors are normalized, as both are here, this value is equal to the absolute value of the cosine of the angle between them (Section 3.3.2).
It is possible that the BSDF will be zero-valued for the provided directions and thus that fcos will be as well—for example, the BSDF is zero if the surface is not transmissive but the two directions are on opposite sides of it.5 In that case, there is no reason to continue the random walk, since subsequent points will make no contribution to the result.
〈Evaluate BSDF at surface for sampled direction〉 ≡ SampledSpectrum fcos = bsdf.f(wo, wp) * AbsDot(wp, isect.shading.n); if (!fcos) return Le; |
33 |
AbsDot() 90
BSDF 544
BSDF::f() 545
Pi 1033
Point2f 92
RandomWalkIntegrator:: LiRandomWalk() 33
RandomWalkIntegrator:: maxDepth 34
SampledSpectrum 171
Sampler::Get2D() 470
SampleUniformSphere() 1016
ScratchBuffer 1078
SurfaceInteraction::GetBSDF() 682
SurfaceInteraction:: shading::n 139
SurfaceInteraction:: SpawnRay() 645
Vector3f 86
The remaining task is to compute the new ray leaving the surface in the sampled direction ω′. This task is handled by the SpawnRay() method, which returns a ray leaving an intersection in the provided direction, ensuring that the ray is sufficiently offset from the surface that it does not incorrectly reintersect it due to round-off error. Given the ray, the recursive call to LiRandomWalk() can be made to estimate the incident radiance, which completes the estimate of Equation (1.2).
〈Recursively trace ray to estimate incident radiance at surface〉 ≡ ray = isect.SpawnRay(wp); return Le + fcos * LiRandomWalk(ray, lambda, sampler, scratchBuffer, depth + 1) / (1 / (4 * Pi)); |
33 |
Figure 1.20: Watercolor Scene Rendered Using 32 Samples per Pixel. (a) Rendered using the RandomWalkIntegrator. (b) Rendered using the PathIntegrator, which follows the same general approach but uses more sophisticated Monte Carlo techniques. The PathIntegrator gives a substantially better image for roughly the same amount of work, with 54.5× reduction in mean squared error.
This simple approach has many shortcomings. For example, if the emissive surfaces are small, most ray paths will not find any light and many rays will need to be traced to form an accurate image. In the limit case of a point light source, the image will be black, since there is zero probability of intersecting such a light source. Similar issues apply with BSDF models that scatter light in a concentrated set of directions. In the limiting case of a perfect mirror that scatters incident light along a single direction, the RandomWalkIntegrator will never be able to randomly sample that direction.
Those issues and more can be addressed through more sophisticated application of Monte Carlo integration techniques. In subsequent chapters, we will introduce a succession of improvements that lead to much more accurate results. The integrators that are defined in Chapters 13 through 15 are the culmination of those developments. All still build on the same basic ideas used in the RandomWalkIntegrator, but are much more efficient and robust than it is. Figure 1.20 compares the RandomWalkIntegrator to one of the improved integrators and gives a sense of how much improvement is possible.
PathIntegrator 833
RandomWalkIntegrator 33
1.4 HOW TO PROCEED THROUGH THIS BOOK
We have written this book assuming it will be read in roughly front-to-back order. We have tried to minimize the number of forward references to ideas and interfaces that have not yet been introduced, but we do assume that the reader is acquainted with the previous content at any particular point in the text. Some sections go into depth about advanced topics that some readers may wish to skip over, particularly on first reading; each advanced section is identified by an asterisk in its title.
Because of the modular nature of the system, the main requirements are that the reader be familiar with the low-level classes like Point3f, Ray, and SampledSpectrum; the interfaces defined by the abstract base classes listed in Table 1.1; and the rendering loop that culminates in calls to integrators’ RayIntegrator::Li() methods. Given that knowledge, for example, the reader who does not care about precisely how a camera model based on a perspective projection matrix maps CameraSamples to rays can skip over the implementation of that camera and can just remember that the Camera::GenerateRayDifferential() method somehow turns a CameraSample into a RayDifferential.
The remainder of this book is divided into four main parts of a few chapters each. First, Chapters 2 through 4 introduce the foundations of the system. A brief introduction to the key ideas underlying Monte Carlo integration is provided in Chapter 2, and Chapter 3 then describes widely used geometric classes like Point3f, Ray, and Bounds3f. Chapter 4 introduces the physical units used to measure light and the SampledSpectrum class that pbrt uses to represent spectral distributions. It also discusses color, the human perception of spectra, which affects how input is provided to the renderer and how it generates output.
The second part of the book covers image formation and how the scene geometry is represented. Chapter 5 defines the Camera interface and a few different camera implementations before discussing the overall process of turning spectral radiance arriving at the film into images. Chapter 6 then introduces the Shape interface and gives implementations of a number of shapes, including showing how to perform ray intersection tests with them. Chapter 7 describes the implementations of the acceleration structures that make ray tracing more efficient by skipping tests with primitives that a ray can be shown to definitely not intersect. Finally, Chapter 8’s topic is the Sampler classes that place samples on the image plane and provide random samples for Monte Carlo integration.
The third part of the book is about light and how it scatters from surfaces and participating media. Chapter 9 includes a collection of classes that define a variety of types of reflection from surfaces. Materials, described in Chapter 10, use these reflection functions to implement a number of different surface types, such as plastic, glass, and metal. Spatial variation in material properties (color, roughness, etc.) is modeled by textures, which are also described in Chapter 10. Chapter 11 introduces the abstractions that describe how light is scattered and absorbed in participating media, and Chapter 12 then describes the interface for light sources and a variety of light source implementations.
Bounds3f 97
Camera 206
Camera:: GenerateRayDifferential() 207
CameraSample 206
Point3f 92
RandomWalkIntegrator 33
Ray 95
RayDifferential 96
RayIntegrator::Li() 31
SampledSpectrum 171
Sampler 469
Shape 261
The last part brings all the ideas from the rest of the book together to implement a number of interesting light transport algorithms. The integrators in Chapters 13 and 14 represent a variety of different applications of Monte Carlo integration to compute more accurate approximations of the light transport equation than the RandomWalkIntegrator. Chapter 15 then describes the implementation of a high-performance integrator that runs on the GPU, based on all the same classes that are used in the implementations of the CPU-based integrators.
Chapter 16, the last chapter of the book, provides a brief retrospective and discussion of system design decisions along with a number of suggestions for more far-reaching projects than those in the exercises. Appendices contain more Monte Carlo sampling algorithms, describe utility functions, and explain details of how the scene description is created as the input file is parsed.
At the end of each chapter you will find exercises related to the material covered in that chapter. Each exercise is marked as one of three levels of difficulty:
➊ An exercise that should take only an hour or two
➋ A reading and/or implementation task that would be suitable for a course assignment and should take between 10 and 20 hours of work
➌ A suggested final project for a course that will likely take 40 hours or more to complete
Figures throughout the book compare the results of rendering the same scene using different algorithms. As with previous editions of the book, we have done our best to ensure that these differences are evident on the printed page, though even high quality printing cannot match modern display technology, especially now with the widespread availability of high dynamic range displays.
We have therefore made all of the rendered images that are used in figures available online. For example, the first image shown in this chapter as Figure 1.1 is available at the URL pbr-book.org/4ed/fig/1.1. All of the others follow the same naming scheme.
Starting on November 1, 2023, the full contents of this book will be freely available online at pbr-book.org/4ed. (The previous edition of the book is already available at that website.)
The online edition includes additional content that could not be included in the printed book due to page constraints. All of that material is supplementary to the contents of this book. For example, it includes the implementation of an additional camera model, a kd-tree acceleration structure, and a full chapter on bidirectional light transport algorithms. (Almost all of the additional material appeared in the previous edition of the book.)
1.5 USING AND UNDERSTANDING THE CODE
The pbrt source code distribution is available from pbrt.org. The website also includes additional documentation, images rendered with pbrt, example scenes, errata, and links to a bug reporting system. We encourage you to visit the website and subscribe to the pbrt mailing list.
pbrt is written in C++, but we have tried to make it accessible to non-C++ experts by limiting the use of esoteric features of the language. Staying close to the core language features also helps with the system’s portability. We make use of C++’s extensive standard library whenever it is applicable but will not discuss the semantics of calls to standard library functions in the text. Our expectation is that the reader will consult documentation of the standard library as necessary.
We will occasionally omit short sections of pbrt’s source code from the book. For example, when there are a number of cases to be handled, all with nearly identical code, we will present one case and note that the code for the remaining cases has been omitted from the text. Default class constructors are generally not shown, and the text also does not include details like the various #include directives at the start of each source file. All the omitted code can be found in the pbrt source code distribution.
1.5.1 SOURCE CODE ORGANIZATION
The source code used for building pbrt is under the src directory in the pbrt distribution. In that directory are src/ext, which has the source code for various third-party libraries that are used by pbrt, and src/pbrt, which contains pbrt’s source code. We will not discuss the third-party libraries’ implementations in the book.
The source files in the src/pbrt directory mostly consist of implementations of the various interface types. For example, shapes.h and shapes.cpp have implementations of the Shape interface, materials.h and materials.cpp have materials, and so forth. That directory also holds the source code for parsing pbrt’s scene description files.
The pbrt.h header file in src/pbrt is the first file that is included by all other source files in the system. It contains a few macros and widely useful forward declarations, though we have tried to keep it short and to minimize the number of other headers that it includes in the interests of compile time efficiency.
The src/pbrt directory also contains a number of subdirectories. They have the following roles:
Functions and classes are generally named using Camel case, with the first letter of each word capitalized and no delineation for spaces. One exception is some methods of container classes, which follow the naming convention of the C++ standard library when they have matching functionality (e.g., size() and begin() and end() for iterators). Variables also use Camel case, though with the first letter lowercase, except for a few global variables.
We also try to match mathematical notation in naming: for example, we use variables like p for points p and w for directions ω. We will occasionally add a p to the end of a variable to denote a primed symbol: wp for ω′. Underscores are used to indicate subscripts in equations: theta_o for θo, for example.
Our use of underscores is not perfectly consistent, however. Short variable names often omit the underscore—we use wi for ωi and we have already seen the use of Li for Li. We also occasionally use an underscore to separate a word from a lowercase mathematical symbol. For example, we use Sample_f for a method that samples a function f rather than Samplef, which would be more difficult to read, or SampleF, which would obscure the connection to the function f (“where was the function F defined?”).
Integrator 22
Primitive 398
Shape 261
WavefrontPathIntegrator 939
C++ provides two different mechanisms for passing an object to a function or method by reference: pointers and references. If a function argument is not intended as an output variable, either can be used to save the expense of passing the entire structure on the stack. The convention in pbrt is to use a pointer when the argument will be completely changed by the function or method, a reference when some of its internal state will be changed but it will not be fully reinitialized, and const references when it will not be changed at all. One important exception to this rule is that we will always use a pointer when we want to be able to pass nullptr to indicate that a parameter is not available or should not be used.
1.5.4 ABSTRACTION VERSUS EFFICIENCY
One of the primary tensions when designing interfaces for software systems is making a reasonable trade-off between abstraction and efficiency. For example, many programmers religiously make all data in all classes private and provide methods to obtain or modify the values of the data items. For simple classes (e.g., Vector3f), we believe that approach needlessly hides a basic property of the implementation—that the class holds three floating-point coordinates—that we can reasonably expect to never change. Of course, using no information hiding and exposing all details of all classes’ internals leads to a code maintenance nightmare, but we believe that there is nothing wrong with judiciously exposing basic design decisions throughout the system. For example, the fact that a Ray is represented with a point, a vector, a time, and the medium it is in is a decision that does not need to be hidden behind a layer of abstraction. Code elsewhere is shorter and easier to understand when details like these are exposed.
An important thing to keep in mind when writing a software system and making these sorts of trade-offs is the expected final size of the system. pbrt is roughly 70,000 lines of code and it is never going to grow to be a million lines of code; this fact should be reflected in the amount of information hiding used in the system. It would be a waste of programmer time (and likely a source of runtime inefficiency) to design the interfaces to accommodate a system of a much higher level of complexity.
We have reimplemented a subset of the C++ standard library in the pstd namespace; this was necessary in order to use those parts of it interchangeably on the CPU and on the GPU. For the purposes of reading pbrt’s source code, anything in pstd provides the same functionality with the same type and methods as the corresponding entity in std. We will therefore not document usage of pstd in the text here.
Almost all dynamic memory allocation for the objects that represent the scene in pbrt is performed using an instance of an Allocator that is provided to the object creation methods. In pbrt, Allocator is shorthand for the C++ standard library’s pmr::polymorphic_allocator type. Its definition is in pbrt.h so that it is available to all other source files.
〈Define Allocator〉 ≡
using Allocator = pstd::pmr::polymorphic_allocator<std::byte>;
std::pmr::polymorphic_allocator implementations provide a few methods for allocating and freeing objects. These three are used widely in pbrt:6
Ray 95
Vector3f 86
void *allocate_bytes(size_t nbytes, size_t alignment);
template <class T> T *allocate_object(size_t n = 1);
template <class T, class… Args> T *new_object(Args &&… args);
The first, allocate_bytes(), allocates the specified number of bytes of memory. Next, allocate_object() allocates an array of n objects of the specified type T, initializing each one with its default constructor. The final method, new_object(), allocates a single object of type T and calls its constructor with the provided arguments. There are corresponding methods for freeing each type of allocation: deallocate_bytes(), deallocate_object(), and delete_object().
A tricky detail related to the use of allocators with data structures from the C++ standard library is that a container’s allocator is fixed once its constructor has run. Thus, if one container is assigned to another, the target container’s allocator is unchanged even though all the values it stores are updated. (This is the case even with C++’s move semantics.) Therefore, it is common to see objects’ constructors in pbrt passing along an allocator in member initializer lists for containers that they store even if they are not yet ready to set the values stored in them.
Using an explicit memory allocator rather than direct calls to new and delete has a few advantages. Not only does it make it easy to do things like track the total amount of memory that has been allocated, but it also makes it easy to substitute allocators that are optimized for many small allocations, as is useful when building acceleration structures in Chapter 7. Using allocators in this way also makes it easy to store the scene objects in memory that is visible to the GPU when GPU rendering is being used.
As mentioned in Section 1.3, virtual functions are generally not used for dynamic dispatch with polymorphic types in pbrt (the main exception being the Integrators). Instead, the TaggedPointer class is used to represent a pointer to one of a specified set of types; it includes machinery for runtime type identification and thence dynamic dispatch. (Its implementation can be found in Appendix B.4.4.) Two considerations motivate its use.
First, in C++, an instance of an object that inherits from an abstract base class includes a hidden virtual function table pointer that is used to resolve virtual function calls. On most modern systems, this pointer uses eight bytes of memory. While eight bytes may not seem like much, we have found that when rendering complex scenes with previous versions of pbrt, a substantial amount of memory would be used just for virtual function pointers for shapes and primitives. With the TaggedPointer class, there is no incremental storage cost for type information.
The other problem with virtual function tables is that they store function pointers that point to executable code. Of course, that’s what they are supposed to do, but this characteristic means that a virtual function table can be valid for method calls from either the CPU or from the GPU, but not from both simultaneously, since the executable code for the different processors is stored at different memory locations. When using the GPU for rendering, it is useful to be able to call methods from both processors, however.
For all the code that just calls methods of polymorphic objects, the use of pbrt’s Tagged Pointer in place of virtual functions makes no difference other than the fact that method calls are made using the . operator, just as would be used for a C++ reference. Section 4.5.1, which introduces Spectrum, the first class based on TaggedPointer that occurs in the book, has more details about how pbrt’s dynamic dispatch scheme is implemented.
Spectrum 165
TaggedPointer 1073
We have tried to make pbrt efficient through the use of well-chosen algorithms rather than through local micro-optimizations, so that the system can be more easily understood. However, efficiency is an integral part of rendering, and so we discuss performance issues throughout the book.
For both CPUs and GPUs, processing performance continues to grow more quickly than the speed at which data can be loaded from main memory into the processor. This means that waiting for values to be fetched from memory can be a major performance limitation. The most important optimizations that we discuss relate to minimizing unnecessary memory access and organizing algorithms and data structures in ways that lead to coherent access patterns; paying attention to these issues can speed up program execution much more than reducing the total number of instructions executed.
Debugging a renderer can be challenging, especially in cases where the result is correct most of the time but not always. pbrt includes a number of facilities to ease debugging.
One of the most important is a suite of unit tests. We have found unit testing to be invaluable in the development of pbrt for the reassurance it gives that the tested functionality is very likely to be correct. Having this assurance relieves the concern behind questions during debugging such as “am I sure that the hash table that is being used here is not itself the source of my bug?” Alternatively, a failing unit test is almost always easier to debug than an incorrect image generated by the renderer; many of the tests have been added along the way as we have debugged pbrt. Unit tests for a file code.cpp are found in code_tests.cpp. All the unit tests are executed by an invocation of the pbrt_test executable and specific ones can be selected via command-line options.
There are many assertions throughout the pbrt codebase, most of them not included in the book text. These check conditions that should never be true and issue an error and exit immediately if they are found to be true at runtime. (See Section B.3.6 for the definitions of the assertion macros used in pbrt.) A failed assertion gives a first hint about the source of an error; like a unit test, an assertion helps focus debugging, at least with a starting point. Some of the more computationally expensive assertions in pbrt are only enabled for debug builds; if the renderer is crashing or otherwise producing incorrect output, it is worthwhile to try running a debug build to see if one of those additional assertions fails and yields a clue.
We have also endeavored to make the execution of pbrt at a given pixel sample deterministic. One challenge with debugging a renderer is a crash that only happens after minutes or hours of rendering computation. With deterministic execution, rendering can be restarted at a single pixel sample in order to more quickly return to the point of a crash. Furthermore, upon a crash pbrt will print a message such as “Rendering failed at pixel (16, 27) sample 821. Debug with --debugstart 16,27,821”. The values printed after “debugstart” depend on the integrator being used, but are sufficient to restart its computation close to the point of a crash.
Finally, it is often useful to print out the values stored in a data structure during the course of debugging. We have implemented ToString() methods for nearly all of pbrt’s classes. They return a std::string representation of them so that it is easy to print their full object state during program execution. Furthermore, pbrt’s custom Printf() and StringPrintf() functions (Section B.3.3) automatically use the string returned by ToString() for an object when a %s specifier is found in the formatting string.
Printf() 1064
StringPrintf() 1064
1.5.10 PARALLELISM AND THREAD SAFETY
In pbrt (as is the case for most ray tracers), the vast majority of data at rendering time is read only (e.g., the scene description and texture images). Much of the parsing of the scene file and creation of the scene representation in memory is done with a single thread of execution, so there are few synchronization issues during that phase of execution.7 During rendering, concurrent read access to all the read-only data by multiple threads works with no problems on both the CPU and the GPU; we only need to be concerned with situations where data in memory is being modified.
As a general rule, the low-level classes and structures in the system are not thread-safe. For example, the Point3f class, which stores three float values to represent a point in 3D space, is not safe for multiple threads to call methods that modify it at the same time. (Multiple threads can use Point3fs as read-only data simultaneously, of course.) The runtime overhead to make Point3f thread-safe would have a substantial effect on performance with little benefit in return.
The same is true for classes like Vector3f, Normal3f, SampledSpectrum, Transform, Quaternion, and SurfaceInteraction. These classes are usually either created at scene construction time and then used as read-only data or allocated on the stack during rendering and used only by a single thread.
The utility classes ScratchBuffer (used for high-performance temporary memory allocation) and RNG (pseudo-random number generation) are also not safe for use by multiple threads; these classes store state that is modified when their methods are called, and the overhead from protecting modification to their state with mutual exclusion would be excessive relative to the amount of computation they perform. Consequently, in code like the ImageTileIntegrator::Render() method earlier, pbrt allocates per-thread instances of these classes on the stack.
With two exceptions, implementations of the base types listed in Table 1.1 are safe for multiple threads to use simultaneously. With a little care, it is usually straightforward to implement new instances of these base classes so they do not modify any shared state in their methods.
The first exceptions are the Light Preprocess() method implementations. These are called by the system during scene construction, and implementations of them generally modify shared state in their objects. Therefore, it is helpful to allow the implementer to assume that only a single thread will call into these methods. (This is a separate issue from the consideration that implementations of these methods that are computationally intensive may use ParallelFor() to parallelize their computation.)
The second exception is Sampler class implementations; their methods are also not expected to be thread-safe. This is another instance where this requirement would impose an excessive performance and scalability impact; many threads simultaneously trying to get samples from a single Sampler would limit the system’s overall performance. Therefore, as described in Section 1.3.4, a unique Sampler is created for each rendering thread using Sampler::Clone().
All stand-alone functions in pbrt are thread-safe (as long as multiple threads do not pass pointers to the same data to them).
ImageTileIntegrator::Render() 25
Light 740
Light::Preprocess() 743
Normal3f 94
ParallelFor() 1107
Point3f 92
RNG 1054
SampledSpectrum 171
Sampler 469
Sampler::Clone() 470
ScratchBuffer 1078
SurfaceInteraction 138
Transform 120
Vector3f 86
One of our goals in writing this book and building the pbrt system was to make it easier for developers and researchers to experiment with new (or old!) ideas in rendering. One of the great joys in computer graphics is writing new software that makes a new image; even small changes to the system can be fun to experiment with. The exercises throughout the book suggest many changes to make to the system, ranging from small tweaks to major open-ended research projects. Section C.4 in Appendix C has more information about the mechanics of adding new implementations of the interfaces listed in Table 1.1.
Although we made every effort to make pbrt as correct as possible through extensive testing, it is inevitable that some bugs are still present.
If you believe you have found a bug in the system, please do the following:
We will periodically update the pbrt source code repository with bug fixes and minor enhancements. (Be aware that we often let bug reports accumulate for a few months before going through them; do not take this as an indication that we do not value them!) However, we will not make major changes to the pbrt source code so that it does not diverge from the system described here in the book.
1.6 A BRIEF HISTORY OF PHYSICALLY BASED RENDERING
Through the early years of computer graphics in the 1970s, the most important problems to solve were fundamental issues like visibility algorithms and geometric representations. When a megabyte of RAM was a rare and expensive luxury and when a computer capable of a million floating-point operations per second cost hundreds of thousands of dollars, the complexity of what was possible in computer graphics was correspondingly limited, and any attempt to accurately simulate physics for rendering was infeasible.
As computers have become more capable and less expensive, it has become possible to consider more computationally demanding approaches to rendering, which in turn has made physically based approaches viable. This progression is neatly explained by Blinn’s law: “as technology advances, rendering time remains constant.”
Jim Blinn’s simple statement captures an important constraint: given a certain number of images that must be rendered (be it a handful for a research paper or over a hundred thousand for a feature film), it is only possible to take so much processing time for each one. One has a certain amount of computation available and one has some amount of time available before rendering must be finished, so the maximum computation per image is necessarily limited.
Blinn’s law also expresses the observation that there remains a gap between the images people would like to be able to render and the images that they can render: as computers have become faster, content creators have continued to use increased computational capability to render more complex scenes with more sophisticated rendering algorithms, rather than rendering the same scenes as before, just more quickly. Rendering continues to consume all computational capabilities made available to it.
Physically based approaches to rendering started to be seriously considered by graphics researchers in the 1980s. Whitted’s paper (1980) introduced the idea of using ray tracing for global lighting effects, opening the door to accurately simulating the distribution of light in scenes. The rendered images his approach produced were markedly different from any that had been seen before, which spurred excitement about this approach.
Another notable early advancement in physically based rendering was Cook and Torrance’s reflection model (1981, 1982), which introduced microfacet reflection models to graphics. Among other contributions, they showed that accurately modeling microfacet reflection made it possible to render metal surfaces accurately; metal was not well rendered by earlier approaches.
Shortly afterward, Goral et al. (1984) made connections between the thermal transfer literature and rendering, showing how to incorporate global diffuse lighting effects using a physically based approximation of light transport. This method was based on finite-element techniques, where areas of surfaces in the scene exchanged energy with each other. This approach came to be referred to as “radiosity,” after a related physical unit. Following work by Cohen and Greenberg (1985) and Nishita and Nakamae (1985) introduced important improvements. Once again, a physically based approach led to images with lighting effects that had not previously been seen in rendered images, which led to many researchers pursuing improvements in this area.
While the radiosity approach was based on physical units and conservation of energy, in time it became clear that it would not lead to practical rendering algorithms: the asymptotic computational complexity was a difficult-to-manage O(n2), and it was necessary to retessellate geometric models along shadow boundaries for good results; researchers had difficulty developing robust and efficient tessellation algorithms for this purpose. Radiosity’s adoption in practice was limited.
During the radiosity years, a small group of researchers pursued physically based approaches to rendering that were based on ray tracing and Monte Carlo integration. At the time, many looked at their work with skepticism; objectionable noise in images due to Monte Carlo integration error seemed unavoidable, while radiosity-based methods quickly gave visually pleasing results, at least on relatively simple scenes.
In 1984, Cook, Porter, and Carpenter introduced distributed ray tracing, which generalized Whitted’s algorithm to compute motion blur and defocus blur from cameras, blurry reflection from glossy surfaces, and illumination from area light sources (Cook et al. 1984), showing that ray tracing was capable of generating a host of important soft lighting effects.
Shortly afterward, Kajiya (1986) introduced path tracing; he set out a rigorous formulation of the rendering problem (the light transport integral equation) and showed how to apply Monte Carlo integration to solve it. This work required immense amounts of computation: to render a 256 × 256 pixel image of two spheres with path tracing required 7 hours of computation on an IBM 4341 computer, which cost roughly $280,000 when it was first released (Farmer 1981). With von Herzen, Kajiya also introduced the volume-rendering equation to graphics (Kajiya and von Herzen 1984); this equation describes the scattering of light in participating media.
Both Cook et al.’s and Kajiya’s work once again led to images unlike any that had been seen before, demonstrating the value of physically based methods. In subsequent years, important work on Monte Carlo for realistic image synthesis was described in papers by Arvo and Kirk (1990) and Kirk and Arvo (1991). Shirley’s Ph.D. dissertation (1990) and follow-on work by Shirley et al. (1996) were important contributions to Monte Carlo–based efforts. Hall’s book, Illumination and Color in Computer Generated Imagery (1989), was one of the first books to present rendering in a physically based framework, and Andrew Glassner’s Principles of Digital Image Synthesis laid out foundations of the field (1995). Ward’s Radiance rendering system was an early open source physically based rendering system, focused on lighting design (Ward 1994), and Slusallek’s Vision renderer was designed to bridge the gap between physically based approaches and the then widely used RenderMan interface, which was not physically based (Slusallek 1996).
Following Torrance and Cook’s work, much of the research in the Program of Computer Graphics at Cornell University investigated physically based approaches. The motivations for this work were summarized by Greenberg et al. (1997), who made a strong argument for a physically accurate rendering based on measurements of the material properties of real-world objects and on deep understanding of the human visual system.
A crucial step forward for physically based rendering was Veach’s work, described in detail in his dissertation (Veach 1997). Veach advanced key theoretical foundations of Monte Carlo rendering while also developing new algorithms like multiple importance sampling, bidirectional path tracing, and Metropolis light transport that greatly improved its efficiency. Using Blinn’s law as a guide, we believe that these significant improvements in efficiency were critical to practical adoption of these approaches.
Around this time, as computers became faster and more parallel, a number of researchers started pursuing real-time ray tracing; Wald, Slusallek, and Benthin wrote an influential paper that described a highly optimized ray tracer that was much more efficient than previous ray tracers (Wald et al. 2001b). Many subsequent papers introduced increasingly more efficient ray-tracing algorithms. Though most of this work was not physically based, the results led to great progress in ray-tracing acceleration structures and performance of the geometric components of ray tracing. Because physically based rendering generally makes substantial use of ray tracing, this work has in turn had the same helpful effect as faster computers have, making it possible to render more complex scenes with physical approaches.
We end our summary of the key steps in the research progress of physically based rendering at this point, though much more has been done. The “Further Reading” sections in all the subsequent chapters of this book cover this work in detail.
With more capable computers in the 1980s, computer graphics could start to be used for animation and film production. Early examples include Jim Blinn’s rendering of the Voyager 2 flyby of Saturn in 1981 and visual effects in the movies Star Trek II: The Wrath of Khan (1982), Tron (1982), and The Last Starfighter (1984).
In early production use of computer-generated imagery, rasterization-based rendering (notably, the Reyes algorithm (Cook et al. 1987)) was the only viable option. One reason was that not enough computation was available for complex reflection models or for the global lighting effects that physically based ray tracing could provide. More significantly, rasterization had the important advantage that it did not require that the entire scene representation fit into main memory.
When RAM was much less plentiful, almost any interesting scene was too large to fit into main memory. Rasterization-based algorithms made it possible to render scenes while having only a small subset of the full scene representation in memory at any time. Global lighting effects are difficult to achieve if the whole scene cannot fit into main memory; for many years, with limited computer systems, content creators effectively decided that geometric and texture complexity was more important to visual realism than lighting complexity (and in turn physical accuracy).
Many practitioners at this time also believed that physically based approaches were undesirable for production: one of the great things about computer graphics is that one can cheat reality with impunity to achieve a desired artistic effect. For example, lighting designers on regular movies often struggle to place light sources so that they are not visible to the camera or spend considerable effort placing a light to illuminate an actor without shining too much light on the background. Computer graphics offers the opportunity to, for example, implement a light source model that shines twice as much light on a character as on a background object. For many years, this capability seemed much more useful than physical accuracy.
Visual effects practitioners who had the specific need to match rendered imagery to filmed real-world environments pioneered capturing real-world lighting and shading effects and were early adopters of physically based approaches in the late 1990s and early 2000s. (See Snow (2010) for a history of ILM’s early work in this area, for example.)
During this time, Blue Sky Studios adopted a physically based pipeline (Ohmer 1997). The photorealism of an advertisement they made for a Braun shaver in 1992 caught the attention of many, and their short film, Bunny, shown in 1998, was an early example of Monte Carlo global illumination used in production. Its visual look was substantially different from those of films and shorts rendered with Reyes and was widely noted. Subsequent feature films from Blue Sky also followed this approach. Unfortunately, Blue Sky never published significant technical details of their approach, limiting their wider influence.
During the early 2000s, the mental ray ray-tracing system was used by a number of studios, mostly for visual effects. It was an efficient ray tracer with sophisticated global illumination algorithm implementations. The main focus of its developers was computer-aided design and product design applications, so it lacked features like the ability to handle extremely complex scenes and the enormous numbers of texture maps that film production demanded.
After Bunny, another watershed moment came in 2001, when Marcos Fajardo came to the SIGGRAPH conference with an early version of his Arnold renderer. He showed images in the Monte Carlo image synthesis course that not only had complex geometry, textures, and global illumination but also were rendered in tens of minutes. While these scenes were not of the complexity of those used in film production at the time, his results showed many the creative opportunities from the combination of global illumination and complex scenes.
Fajardo brought Arnold to Sony Pictures Imageworks, where work started to transform it to a production-capable physically based rendering system. Many issues had to be addressed, including efficient motion blur, programmable shading, support for massively complex scenes, and deferred loading of scene geometry and textures. Arnold was first used on the movie Monster House and is now available as a commercial product.
In the early 2000s, Pixar’s RenderMan renderer started to support hybrid rasterization and ray-tracing algorithms and included a number of innovative algorithms for computing global illumination solutions in complex scenes. RenderMan was recently rewritten to be a physically based ray tracer, following the general system architecture of pbrt (Christensen 2015).
Figure 1.21: Gravity (2013) featured spectacular computer-generated imagery of a realistic space environment with volumetric scattering and large numbers of anisotropic metal surfaces. The image was generated using Arnold, a physically based rendering system that accounts for global illumination. Image courtesy of Warner Bros. and Framestore.
One of the main reasons that physically based Monte Carlo approaches to rendering have been successful in production is that they end up improving the productivity of artists. These have been some of the important factors:
As of this writing, physically based rendering is used widely for producing computer-generated imagery for movies; Figures 1.21 and 1.22 show images from two recent movies that used physically based approaches.
In a seminal early paper, Arthur Appel (1968) first described the basic idea of ray tracing to solve the hidden surface problem and to compute shadows in polygonal scenes. Goldstein and Nagel (1971) later showed how ray tracing could be used to render scenes with quadric surfaces. Kay and Greenberg (1979) described a ray-tracing approach to rendering transparency, and Whitted’s seminal CACM article described a general recursive ray-tracing algorithm that accurately simulates reflection and refraction from specular surfaces and shadows from point light sources (Whitted 1980). Whitted has recently written an article describing developments over the early years of ray tracing (Whitted 2020).
Figure 1.22: This image from Alita: Battle Angel (2019) was also rendered using a physically based rendering system. Image by Weta Digital, © 2018 Twentieth Century Fox Film Corporation. All Rights Reserved.
In addition to the ones discussed in Section 1.6, notable early books on physically based rendering and image synthesis include Cohen and Wallace’s Radiosity and Realistic Image Synthesis (1993), Sillion and Puech’s Radiosity and Global Illumination (1994), and Ashdown’s Radiosity: A Programmer’s Perspective (1994), all of which primarily describe the finite-element radiosity method. The course notes from the Monte Carlo ray-tracing course at SIGGRAPH have a wealth of practical information (Jensen et al. 2001a, 2003), much of it still relevant, now nearly twenty years later.
In a paper on ray-tracing system design, Kirk and Arvo (1988) suggested many principles that have now become classic in renderer design. Their renderer was implemented as a core kernel that encapsulated the basic rendering algorithms and interacted with primitives and shading routines via a carefully constructed object-oriented interface. This approach made it easy to extend the system with new primitives and acceleration methods. pbrt’s design is based on these ideas.
To this day, a good reference on basic ray-tracer design is Introduction to Ray Tracing (Glassner 1989a), which describes the state of the art in ray tracing at that time and has a chapter by Heckbert that sketches the design of a basic ray tracer. More recently, Shirley and Morley’s Realistic Ray Tracing (2003) offers an easy-to-understand introduction to ray tracing and includes the complete source code to a basic ray tracer. Suffern’s book (2007) also provides a gentle introduction to ray tracing. Shirley’s Ray Tracing in One Weekend series (2020) is an accessible introduction to the joy of writing a ray tracer.
Researchers at Cornell University have developed a rendering testbed over many years; its design and overall structure were described by Trumbore, Lytle, and Greenberg (1993). Its predecessor was described by Hall and Greenberg (1983). This system is a loosely coupled set of modules and libraries, each designed to handle a single task (ray–object intersection acceleration, image storage, etc.) and written in a way that makes it easy to combine appropriate modules to investigate and develop new rendering algorithms. This testbed has been quite successful, serving as the foundation for much of the rendering research done at Cornell through the 1990s.
Radiance was the first widely available open source renderer based fundamentally on physical quantities. It was designed to perform accurate lighting simulation for architectural design. Ward described its design and history in a paper and a book (Ward 1994; Larson and Shakespeare 1998). Radiance is designed in the UNIX style, as a set of interacting programs, each handling a different part of the rendering process. This general type of rendering architecture was first described by Duff (1985).
Glassner’s (1993) Spectrum rendering architecture also focuses on physically based rendering, approached through a signal-processing-based formulation of the problem. It is an extensible system built with a plug-in architecture; pbrt’s approach of using parameter/value lists for initializing implementations of the main abstract interfaces is similar to Spectrum’s. One notable feature of Spectrum is that all parameters that describe the scene can be functions of time.
Slusallek and Seidel (1995, 1996; Slusallek 1996) described the Vision rendering system, which is also physically based and designed to support a wide variety of light transport algorithms. In particular, it had the ambitious goal of supporting both Monte Carlo and finite-element-based light transport algorithms.
Many papers have been written that describe the design and implementation of other rendering systems, including renderers for entertainment and artistic applications. The Reyes architecture, which forms the basis for Pixar’s RenderMan renderer, was first described by Cook et al. (1987), and a number of improvements to the original algorithm have been summarized by Apodaca and Gritz (2000). Gritz and Hahn (1996) described the BMRT ray tracer. The renderer in the Maya modeling and animation system was described by Sung et al. (1998), and some of the internal structure of the mental ray renderer is described in Driemeyer and Herken’s book on its API (Driemeyer and Herken 2002). The design of the high-performance Manta interactive ray tracer was described by Bigler et al. (2006).
OptiX introduced a particularly interesting design approach for high-performance ray tracing: it is based on doing JIT compilation at runtime to generate a specialized version of the ray tracer, intermingling user-provided code (such as for material evaluation and sampling) and renderer-provided code (such as high-performance ray–object intersection). It was described by Parker et al. (2010).
More recently, Eisenacher et al. discussed the ray sorting architecture of Disney’s Hyperion renderer (Eisenacher et al. 2013), and Lee et al. have written about the implementation of the MoonRay rendering system at DreamWorks (Lee et al. 2017). The implementation of the Iray ray tracer was described by Keller et al. (2017).
In 2018, a special issue of ACM Transactions on Graphics included papers describing the implementations of five rendering systems that are used for feature film production. These papers are full of details about the various renderers; reading them is time well spent. They include Burley et al.’s description of Disney’s Hyperion renderer (2018), Christensen et al. on Pixar’s modern RenderMan (2018), Fascione et al. describing Weta Digital’s Manuka (2018), Georgiev et al. on Solid Angle’s version of Arnold (2018) and Kulla et al. on the version of Arnold used at Sony Pictures Imageworks (2018).
Whereas standard rendering algorithms generate images from a 3D scene description, the Mitsuba 2 system is engineered around the corresponding inverse problem. It computes derivatives with respect to scene parameters using JIT-compiled kernels that efficiently run on GPUs and CPUs. These kernels are then used in the inner loop of an optimization algorithm to reconstruct 3D scenes that are consistent with user-provided input images. This topic is further discussed in Section 16.3.1. The system’s design and implementation was described by Nimier-David et al. (2019).
➊ 1.1 | A good way to gain an understanding of pbrt is to follow the process of computing the radiance value for a single ray in a debugger. Build a version of pbrt with debugging symbols and set up your debugger to run pbrt with a not-too-complex scene. Set breakpoints in the ImageTileIntegrator::Render() method and trace through the process of how a ray is generated, how its radiance value is computed, and how its contribution is added to the image. The first time you do this, you may want to specify that only a single thread of execution should be used by providing --nthreads 1 as command-line arguments to pbrt; doing so ensures that all computation is done in the main processing thread, which may make it easier to understand what is going on, depending on how easy your debugger makes it to step through the program when it is running multiple threads. As you gain more understanding about the details of the system later in the book, repeat this process and trace through particular parts of the system more carefully. |
ImageTileIntegrator::Render() 25
_________________
1 The example code in this section is merely illustrative and is not part of pbrt itself.
2 Although digital sensors are now more common than physical film, we will use “film” to encompass both in cases where either could be used.
3 Although ray tracing’s logarithmic complexity is often heralded as one of its key strengths, this complexity is typically only true on average. A number of ray-tracing algorithms that have guaranteed logarithmic running time have been published in the computational geometry literature, but these algorithms only work for certain types of scenes and have very expensive preprocessing and storage requirements. Szirmay-Kalos and Márton provide pointers to the relevant literature (Szirmay-Kalos and Márton 1998). In practice, the ray intersection algorithms presented in this book are sublinear, but without expensive preprocessing and huge memory usage it is always possible to construct worst-case scenes where ray tracing runs in O(mn) time. One consolation is that scenes representing realistic environments generally do not exhibit this worst-case behavior.
4 At the time of writing, these capabilities are only available on NVIDIA hardware, but it would not be too difficult to port pbrt to other architectures that provide them in the future.
5 It would be easy enough to check if the BSDF was only reflective and to only sample directions on the same side of the surface as the ray, but for this simple integrator we will not bother.
6 Because pmr::polymorphic_allocator is a recent addition to C++ that is not yet widely used, yet is widely used in pbrt, we break our regular habit of not documenting standard library functionality in the text here.
7 Exceptions include the fact that we try to load image maps and binary geometry files in parallel, some image resampling performed on texture images, and construction of one variant of the BVHAggregate, though all of these are highly localized.
Rendering is full of integration problems. In addition to the light transport equation (1.1), in the following chapters we will see that integral equations also describe a variety of additional quantities related to light, including the sensor response in a camera, the attenuation and scattering of light in participating media, and scattering from materials like skin. These integral equations generally do not have analytic solutions, so we must turn to numerical methods. Although standard numerical integration techniques like trapezoidal integration or Gaussian quadrature are effective at solving low-dimensional smooth integrals, their rate of convergence is poor for the higher dimensional and discontinuous integrals that are common in rendering. Monte Carlo integration techniques provide one solution to this problem. They use random sampling to evaluate integrals with a convergence rate that is independent of the dimensionality of the integrand.
Monte Carlo integration1 has the useful property that it only requires the ability to evaluate an integrand f(x) at arbitrary points in the domain in order to estimate the value of its integral ∫ f(x) dx. This property not only makes Monte Carlo easy to implement but also makes the technique applicable to a broad variety of integrands. It has a natural extension to multidimensional functions; in Chapter 13, we will see that the light transport algorithm implemented in the RandomWalkIntegrator can be shown to be estimating the value of an infinite-dimensional integral.
Judicious use of randomness has revolutionized the field of algorithm design. Randomized algorithms fall broadly into two classes: Las Vegas and Monte Carlo. Las Vegas algorithms are those that use randomness but always give the same result in the end (e.g., choosing a random array entry as the pivot element in Quicksort). Monte Carlo algorithms, on the other hand, give different results depending on the particular random numbers used along the way but give the right answer on average. So, by averaging the results of several runs of a Monte Carlo algorithm (on the same input), it is possible to find a result that is statistically very likely to be close to the true answer.
RandomWalkIntegrator 33
The following sections discuss the basic principles of Monte Carlo integration, focusing on those that are widely used in pbrt. See also Appendix A, which has the implementations of additional Monte Carlo sampling functions that are more rarely used in the system.
Because Monte Carlo integration is based on randomization, we will start this chapter with a brief review of ideas from probability and statistics that provide the foundations of the approach. Doing so will allow us to introduce the basic Monte Carlo algorithm as well as mathematical tools for evaluating its error.
2.1.1 BACKGROUND AND PROBABILITY REVIEW
We will start by defining some terms and reviewing basic ideas from probability. We assume that the reader is already familiar with basic probability concepts; readers needing a more complete introduction to this topic should consult a textbook such as Sheldon Ross’s Introduction to Probability Models (2002).
A random variable X is a value chosen by some random process. We will generally use capital letters to denote random variables, with exceptions made for a few Greek symbols that represent special random variables. Random variables are always drawn from some domain, which can be either discrete (e.g., a fixed, finite set of possibilities) or continuous (e.g., the real numbers ℝ). Applying a function f to a random variable X results in a new random variable Y = f(X).
For example, the result of a roll of a die is a discrete random variable sampled from the set of events Xi ∈ {1, 2, 3, 4, 5, 6}. Each event has a probability , and the sum of probability ∑ pi is necessarily one. A random variable like this one that has the same probability for all potential values of it is said to be uniform. A function p(X) that gives a discrete random variable’s probability is termed a probability mass function (PMF), and so we could equivalently write
in this case.
Two random variables are independent if the probability of one does not affect the probability of the other. In this case, the joint probability p(X, Y) of two random variables is given by the product of their probabilities:
p(X, Y) = p(X) p(Y).
For example, two random variables representing random samples of the six sides of a die are independent.
For dependent random variables, one’s probability affects the other’s. Consider a bag filled with some number of black balls and some number of white balls. If we randomly choose two balls from the bag, the probability of the second ball being white is affected by the color of the first ball since its choice changes the number of balls of one type left in the bag. We will say that the second ball’s probability is conditioned on the choice of the first one. In this case, the joint probability for choosing two balls X and Y is given by
where p(Y|X) is the conditional probability of Y given a value of X.
BVHLightSampler 796
In the following, it will often be the case that a random variable’s probability is conditioned on many values; for example, when choosing a light source from which to sample illumination, the BVHLightSampler in Section 12.6.3 considers the 3D position of the receiving point and its surface normal, and so the choice of light is conditioned on them. However, we will often omit the variables that a random variable is conditioned on in cases where there are many of them and where enumerating them would obscure notation.
A particularly important random variable is the canonical uniform random variable, which we will write as ξ. This variable takes on all values in its domain [0, 1) independently and with uniform probability. This particular variable is important for two reasons. First, it is easy to generate a variable with this distribution in software—most runtime libraries have a pseudo-random number generator that does just that.2 Second, we can take the canonical uniform random variable ξ and map it to a discrete random variable, choosing Xi if
For lighting applications, we might want to define the probability of sampling illumination from each light in the scene based on its power Φi relative to the total power from all sources:
Notice that these pi values also sum to 1. Given such per-light probabilities, ξ could be used to select a light source from which to sample illumination.
The cumulative distribution function (CDF) P(x) of a random variable is the probability that a value from the variable’s distribution is less than or equal to some value x:
For the die example, , since two of the six possibilities are less than or equal to 2.
Continuous random variables take on values over ranges of continuous domains (e.g., the real numbers, directions on the unit sphere, or the surfaces of shapes in the scene). Beyond ξ, another example of a continuous random variable is the random variable that ranges over the real numbers between 0 and 2, where the probability of its taking on any particular value x is proportional to the value 2 − x: it is twice as likely for this random variable to take on a value around 0 as it is to take one around 1, and so forth.
The probability density function (PDF) formalizes this idea: it describes the relative probability of a random variable taking on a particular value and is the continuous analog of the PMF. The PDF p(x) is the derivative of the random variable’s CDF,
For uniform random variables, p(x) is a constant; this is a direct consequence of uniformity. For ξ we have
PDFs are necessarily nonnegative and always integrate to 1 over their domains. Note that their value at a point x is not necessarily less than 1, however.
Given an interval [a, b] in the domain, integrating the PDF gives the probability that a random variable lies inside the interval:
This follows directly from the first fundamental theorem of calculus and the definition of the PDF.
The expected value Ep[f(x)] of a function f is defined as the average value of the function over some distribution of values p(x) over its domain D. It is defined as
As an example, consider finding the expected value of the cosine function between 0 and π, where p is uniform. Because the PDF p(x) must integrate to 1 over the domain, p(x) =1/π, so3
which is precisely the expected result. (Consider the graph of cos x over [0, π] to see why this is so.)
The expected value has a few useful properties that follow from its definition:
We will repeatedly use these properties in derivations in the following sections.
2.1.3 THE MONTE CARLO ESTIMATOR
We can now define the Monte Carlo estimator, which approximates the value of an arbitrary integral. Suppose that we want to evaluate a 1D integral . Given a supply of independent uniform random variables Xi ∈ [a, b], the Monte Carlo estimator says that the expected value of the estimator
E[Fn], is equal to the integral. This fact can be demonstrated with just a few steps. First, note that the PDF p(x) corresponding to the random variable Xi must be equal to 1/(b − a), since p must not only be a constant but also integrate to 1 over the domain [a, b]. Algebraic manipulation using the properties from Equations (2.4) and (2.5) then shows that
Extending this estimator to multiple dimensions or complex integration domains is straightforward: n independent samples Xi are taken from a uniform multidimensional PDF, and the estimator is applied in the same way. For example, consider the 3D integral
If samples Xi = (xi, yi, zi) are chosen uniformly from the cube from [x0, x1] × [y0, y1] × [z0, z1], then the PDF p(X) is the constant value
and the estimator is
The restriction to uniform random variables can be relaxed with a small generalization. This is an important step, since carefully choosing the PDF from which samples are drawn leads to a key technique for reducing error in Monte Carlo that will be introduced in Section 2.2.2. If the random variables Xi are drawn from a PDF p(x), then the estimator
can be used to estimate the integral instead. The only limitation on p(x) is that it must be nonzero for all x where |f(x)| > 0.
It is similarly not too hard to see that the expected value of this estimator is the desired integral of f:
We can now understand the factor of 1/(4π) in the implementation of the RandomWalk Integrator: directions are uniformly sampled over the unit sphere, which has area 4π. Because the PDF is normalized over the sampling domain, it must have the constant value 1/(4π). When the estimator of Equation (2.7) is applied, that value appears in the divisor.
With Monte Carlo, the number of samples n can be chosen arbitrarily, regardless of the dimensionality of the integrand. This is another important advantage of Monte Carlo over traditional deterministic quadrature techniques, which typically require a number of samples that is exponential in the dimension.
2.1.4 ERROR IN MONTE CARLO ESTIMATORS
Showing that the Monte Carlo estimator converges to the right answer is not enough to justify its use; its rate of convergence is important too. Variance, the expected squared deviation of a function from its expected value, is a useful way to characterize Monte Carlo estimators’ convergence. The variance of an estimator F is defined as
from which it follows that
This property and Equation (2.5) yield an alternative expression for the variance:
Thus, the variance is the expected value of the square minus the square of the expected value.
If the estimator is a sum of independent random variables (like the Monte Carlo estimator Fn), then the variance of the sum is the sum of the individual random variables’ variances:
From Equation (2.10) it is easy to show that variance decreases linearly with the number of samples n. Because variance is squared error, the error in a Monte Carlo estimate therefore only goes down at a rate of O(n−1/2) in the number of samples. Although standard quadrature techniques converge at a faster rate in one dimension, their performance becomes exponentially worse as the dimensionality of the integrand increases, while Monte Carlo’s convergence rate is independent of the dimension, making Monte Carlo the only practical numerical integration algorithm for high-dimensional integrals.
The O(n−1/2) characteristic of Monte Carlo’s rate of error reduction is apparent when watching a progressive rendering of a scene where additional samples are incrementally taken in all pixels. The image improves rapidly for the first few samples when doubling the number of samples is relatively little additional work. Later on, once tens or hundreds of samples have been taken, each additional sample doubling takes much longer and remaining error in the image takes a long time to disappear.
The linear decrease in variance with increasing numbers of samples makes it easy to compare different Monte Carlo estimators. Consider two estimators, where the second has half the variance of the first but takes three times as long to compute an estimate; which of the two is better? In that case, the first is preferable: it could take three times as many samples in the time consumed by the second, in which case it would achieve a 3× variance reduction. This concept can be encapsulated in the efficiency of an estimator F, which is defined as
where V[F] is its variance and T[F] is the running time to compute its value.
Not all estimators of integrals have expected values that are equal to the value of the integral. Such estimators are said to be biased, where the difference
is the amount of bias. Biased estimators may still be desirable if they are able to get close to the correct result more quickly than unbiased estimators. Kalos and Whitlock (1986, pp. 36–37) gave the following example: consider the problem of computing an estimate of the mean value of a uniform distribution Xi ∼ p over the interval from 0 to 1. One could use the estimator
or one could use the biased estimator
The first estimator is unbiased but has variance with order O(n−1). The second estimator’s expected value is
so it is biased, although its variance is O(n−2), which is much better. This estimator has the useful property that its error goes to 0 in the limit as the number of samples n goes to infinity; such estimators are consistent.4 Most of the Monte Carlo estimators used in pbrt are unbiased, with the notable exception of the SPPMIntegrator, which implements a photon mapping algorithm.
Closely related to the variance is the mean squared error (MSE), which is defined as the expectation of the squared difference of an estimator and the true value,
For an unbiased estimator, MSE is equal to the variance; otherwise it is the sum of variance and the squared bias of the estimator.
It is possible to work out the variance and MSE of some simple estimators in closed form, but for most of the ones of interest in rendering, this is not possible. Yet it is still useful to be able to quantify these values. For this purpose, the sample variance can be computed using a set of independent random variables Xi. Equation (2.8) points at one way to compute the sample variance for a set of n random variables Xi. If the sample mean is computed as their average, , then the sample variance is
The division by n − 1 rather than n is Bessel’s correction, and ensures that the sample variance is an unbiased estimate of the variance. (See also Section B.2.11, where a numerically stable approach for computing the sample variance is introduced.)
The sample variance is itself an estimate of the variance, so it has variance itself. Consider, for example, a random variable that has a value of 1 99.99% of the time, and a value of one million 0.01% of the time. If we took ten random samples of it that all had the value 1, the sample variance would suggest that the random variable had zero variance even though its variance is actually much higher.
If an accurate estimate of the integral can be computed (for example, using a large number of samples), then the mean squared error can be estimated by
The imgtool utility program that is provided in pbrt’s distribution can compute an image’s MSE with respect to a reference image via its diff option.
Given an unbiased Monte Carlo estimator, we are in the fortunate position of having a reliable relationship between the number of samples taken and variance (and thus, error). If we have an unacceptably noisy rendered image, increasing the number of samples will reduce error in a predictable way, and—given enough computation—an image of sufficient quality can be generated.
However, computation takes time, and often there is not enough of it. The deadline for a movie may be at hand, or the sixtieth-of-a-second time slice in a real-time renderer may be coming to an end. Given the consequentially limited number of samples, the only option for variance reduction is to find ways to make more of the samples that can be taken. Fortunately, a variety of techniques have been developed to improve the basic Monte Carlo estimator by making the most of the samples that are taken; here we will discuss the most important ones that are used in pbrt.
A classic and effective family of techniques for variance reduction is based on the careful placement of samples in order to better capture the features of the integrand (or, more accurately, to be less likely to miss important features). These techniques are used extensively in pbrt. Stratified sampling decomposes the integration domain into regions and places samples in each one; here we will analyze that approach in terms of its variance reduction properties. Later, in Section 8.2.1, we will return with machinery based on Fourier analysis that provides further insights about it.
Stratified sampling subdivides the integration domain Λ into n nonoverlapping regions Λ1, Λ2, …, Λn. Each region is called a stratum, and they must completely cover the original domain:
To draw samples from Λ, we will draw ni samples from each Λi, according to densities pi inside each stratum. A simple example is supersampling a pixel. With stratified sampling, the area around a pixel is divided into a k × k grid, and a sample is drawn uniformly within each grid cell. This is better than taking k2 random samples, since the sample locations are less likely to clump together. Here we will show why this technique reduces variance.
Within a single stratum Λi, the Monte Carlo estimate is
where Xi,j is the jth sample drawn from density pi. The overall estimate is , where vi is the fractional volume of stratum i (vi ∈ (0, 1]).
The true value of the integrand in stratum i is
and the variance in this stratum is
Thus, with ni samples in the stratum, the variance of the per-stratum estimator is . This shows that the variance of the overall estimator is
If we make the reasonable assumption that the number of samples ni is proportional to the volume vi, then we have ni = vin, and the variance of the overall estimator is
To compare this result to the variance without stratification, we note that choosing an unstratified sample is equivalent to choosing a random stratum I according to the discrete probability distribution defined by the volumes vi and then choosing a random sample X in I. In this sense, X is chosen conditionally on I, so it can be shown using conditional probability that
where Q is the mean of f over the whole domain Λ.5
Figure 2.1: Variance is higher and the image noisier (a) when independent random sampling is used than (b) when a stratified distribution of sample directions is used instead. (Bunny model courtesy of the Stanford Computer Graphics Laboratory.)
There are two things to notice about Equation (2.12). First, we know that the right-hand sum must be nonnegative, since variance is always nonnegative. Second, it demonstrates that stratified sampling can never increase variance. Stratification always reduces variance unless the right-hand sum is exactly 0. It can only be 0 when the function f has the same mean over each stratum Λi. For stratified sampling to work best, we would like to maximize the right-hand sum, so it is best to make the strata have means that are as unequal as possible. This explains why compact strata are desirable if one does not know anything about the function f. If the strata are wide, they will contain more variation and will have μi closer to the true mean Q.
Figure 2.1 shows the effect of using stratified sampling versus an independent random distribution for sampling when rendering an image that includes glossy reflection. There is a reasonable reduction in variance at essentially no cost in running time.
The main downside of stratified sampling is that it suffers from the same “curse of dimensionality” as standard numerical quadrature. Full stratification in D dimensions with S strata per dimension requires SD samples, which quickly becomes prohibitive. Fortunately, it is often possible to stratify some of the dimensions independently and then randomly associate samples from different dimensions; this approach will be used in Section 8.5. Choosing which dimensions are stratified should be done in a way that stratifies dimensions that tend to be most highly correlated in their effect on the value of the integrand (Owen 1998).
Importance sampling is a powerful variance reduction technique that exploits the fact that the Monte Carlo estimator
converges more quickly if the samples are taken from a distribution p(x) that is similar to the function f(x) in the integrand. In this case, samples are more likely to be taken when the magnitude of the integrand is relatively large. Importance sampling is one of the most frequently used variance reduction techniques in rendering, since it is easy to apply and is very effective when good sampling distributions are used.
To see why such sampling distributions reduce error, first consider the effect of using a distribution p(x) ∝ f(x), or p(x) = cf(x).6 It is trivial to show that normalization of the PDF requires that
Finding such a PDF requires that we know the value of the integral, which is what we were trying to estimate in the first place. Nonetheless, if we could sample from this distribution, each term of the sum in the estimator would have the value
The variance of the estimator is zero! Of course, this is ludicrous since we would not bother using Monte Carlo if we could integrate f directly. However, if a density p(x) can be found that is similar in shape to f(x), variance is reduced.
As a more realistic example, consider the Gaussian function , which is plotted in Figure 2.2(a) over [0, 1]. Its value is close to zero over most of the domain. Samples X with X < 0.2 or X > 0.3 are of little help in estimating the value of the integral since they give no information about the magnitude of the bump in the function’s value around 1/4. With uniform sampling and the basic Monte Carlo estimator, variance is approximately 0.0365.
If samples are instead drawn from the piecewise-constant distribution
which is plotted in Figure 2.2(b), and the estimator from Equation (2.7) is used instead, then variance is reduced by a factor of approximately 6.7×. A representative set of 6 points from this distribution is shown in Figure 2.2(c); we can see that most of the evaluations of f(x) are in the interesting region where it is not nearly zero.
Figure 2.2: (a) A narrow Gaussian function that is close to zero over most of the range [0, 1]. The basic Monte Carlo estimator of Equation (2.6) has relatively high variance if it is used to integrate this function, since most samples have values that are close to zero. (b) A PDF that roughly approximates the function’s distribution. If this PDF is used to generate samples, variance is reduced substantially. (c) A representative distribution of samples generated according to (b).
Importance sampling can increase variance if a poorly chosen distribution is used, however. Consider instead using the distribution
for estimating the integral of the Gaussian function. This PDF increases the probability of sampling the function where its value is close to zero and decreases the probability of sampling it where its magnitude is larger.
Not only does this PDF generate fewer samples where the integrand is large, but when it does, the magnitude of f(x)/p(x) in the Monte Carlo estimator will be especially high since p(x) = 0.2 in that region. The result is approximately 5.4× higher variance than uniform sampling, and nearly 36× higher variance than the better PDF above. In the context of Monte Carlo integration for rendering where evaluating the integrand generally involves the expense of tracing a ray, it is desirable to minimize the number of samples taken; using an inferior sampling distribution and making up for it by evaluating more samples is an unappealing option.
2.2.3 MULTIPLE IMPORTANCE SAMPLING
We are frequently faced with integrals that are the product of two or more functions: ∫ fa(x)fb(x) dx. It is often possible to derive separate importance sampling strategies for individual factors individually, though not one that is similar to their product. This situation is especially common in the integrals involved with light transport, such as in the product of BSDF, incident radiance, and a cosine factor in the light transport equation (1.1).
To understand the challenges involved with applying Monte Carlo to such products, assume for now the good fortune of having two sampling distributions pa and pb that match the distributions of fa and fb exactly. (In practice, this will not normally be the case.) With the Monte Carlo estimator of Equation (2.7), we have two options: we might draw samples using pa, which gives the estimator
where c is a constant equal to the integral of fa, since pa(x) ∝ fa(x). The variance of this estimator is proportional to the variance of fb, which may itself be high.7 Conversely, we might sample from pb, though doing so gives us an estimator with variance proportional to the variance of fa, which may similarly be high. In the more common case where the sampling distributions only approximately match one of the factors, the situation is usually even worse.
Unfortunately, the obvious solution of taking some samples from each distribution and averaging the two estimators is not much better. Because variance is additive, once variance has crept into an estimator, we cannot eliminate it by adding it to another low-variance estimator.
Multiple importance sampling (MIS) addresses exactly this issue, with an easy-to-implement variance reduction technique. The basic idea is that, when estimating an integral, we should draw samples from multiple sampling distributions, chosen in the hope that at least one of them will match the shape of the integrand reasonably well, even if we do not know which one this will be. MIS then provides a method to weight the samples from each technique that can eliminate large variance spikes due to mismatches between the integrand’s value and the sampling density. Specialized sampling routines that only account for unusual special cases are even encouraged, as they reduce variance when those cases occur, with relatively little cost in general.
With two sampling distributions pa and pb and a single sample taken from each one, X ∼ pa and Y ∼ pb, the MIS Monte Carlo estimator is wa(X)
where wa and wb are weighting functions chosen such that the expected value of this estimator is the value of the integral of f(x).
More generally, given n sampling distributions pi with ni samples Xi,j taken from the ith distribution, the MIS Monte Carlo estimator is
(The full set of conditions on the weighting functions for the estimator to be unbiased are that they sum to 1 when f(x) ≠ 0, , and that wi(x) = 0 if pi(x) = 0.)
Setting xi(X) = 1/n corresponds to the case of summing the various estimators, which we have already seen is an ineffective way to reduce variance. It would be better if the weighting functions were relatively large when the corresponding sampling technique was a good match to the integrand and relatively small when it was not, thus reducing the contribution of high-variance samples.
In practice, a good choice for the weighting functions is given by the balance heuristic, which attempts to fulfill this goal by taking into account all the different ways that a sample could have been generated, rather than just the particular one that was used to do so. The balance heuristic’s weighting function for the ith sampling technique is
With the balance heuristic and our example of taking a single sample from each of two sampling techniques, the estimator of Equation (2.13) works out to be
Each evaluation of f is divided by the sum of all PDFs for the corresponding sample rather than just the one that generated the sample. Thus, if pa generates a sample with low probability at a point where the pb has a higher probability, then dividing by pa(X) + pb(X) reduces the sample’s contribution. Effectively, such samples are downweighted when sampled from pa, recognizing that the sampling technique associated with pb is more effective at the corresponding point in the integration domain. As long as just one of the sampling techniques has a reasonable probability of sampling a point where the function’s value is large, the MIS weights can lead to a significant reduction in variance.
BalanceHeuristic() computes Equation (2.14) for the specific case of two distributions pa and pb. We will not need a more general multidistribution case in pbrt.
〈Sampling Inline Functions〉 ≡
Float BalanceHeuristic(int nf, Float fPdf, int ng, Float gPdf) {
return (nf * fPdf) / (nf * fPdf + ng * gPdf);
}
In practice, the power heuristic often reduces variance even further. For an exponent β, the power heuristic is
Note that the power heuristic has a similar form to the balance heuristic, though it further reduces the contribution of relatively low probabilities. Our implementation has β = 2 hard-coded in its implementation; that parameter value usually works well in practice.
Float 23
Sqr() 1034
〈Sampling Inline Functions〉 +≡
Float PowerHeuristic(int nf, Float fPdf, int ng, Float gPdf) {
Float f = nf * fPdf, g = ng * gPdf;
return Sqr(f) / (Sqr(f) + Sqr(g));
}
Multiple importance sampling can be applied even without sampling from all the distributions. This approach is known as the single sample model. We will not include the derivation here, but it can be shown that given an integrand f(x), if a sampling technique pi is chosen from a set of techniques with probability qi and a sample X is drawn from pi, then the single sample estimator
gives an unbiased estimate of the integral. For the single sample model, the balance heuristic is provably optimal.
One shortcoming of multiple importance sampling is that if one of the sampling techniques is a very good match to the integrand, MIS can slightly increase variance. For rendering applications, MIS is almost always worthwhile for the variance reduction it provides in cases that can otherwise have high variance.
MIS Compensation
Multiple importance sampling is generally applied using probability distributions that are all individually valid for importance sampling the integrand, with nonzero probability of generating a sample anywhere that the integrand is nonzero. However, when MIS is being used, it is not a requirement that all PDFs are nonzero where the function’s value is nonzero; only one of them must be.
This observation led to the development of a technique called MIS compensation, which can further reduce variance. It is motivated by the fact that if all the sampling distributions allocate some probability to sampling regions where the integrand’s value is small, it is often the case that that region of the integrand ends up being oversampled, leaving the region where the integrand is high undersampled.
MIS compensation is based on the idea of sharpening one or more (but not all) the probability distributions—for example, by adjusting them to have zero probability in areas where they earlier had low probability. A new sampling distribution p′ can, for example, be defined by
for some fixed value δ.
This technique is especially easy to apply in the case of tabularized sampling distributions. In Section 12.5, it is used to good effect for sampling environment map light sources.
Russian roulette is a technique that can improve the efficiency of Monte Carlo estimates by skipping the evaluation of samples that would make a small contribution to the final result. In rendering, we often have estimators of the form
where the integrand consists of some factors f(X) that are easily evaluated (e.g., those that relate to how the surface scatters light) and others that are more expensive to evaluate, such as a binary visibility factor v(X) that requires tracing a ray. In these cases, most of the computational expense of evaluating the estimator lies in v.
If f(X) is zero, it is obviously worth skipping the work of evaluating v(X), since its value will not affect the value of the estimator. However, if we also skipped evaluating estimators where f(X) was small but nonzero, then we would introduce bias into the estimator and would systemically underestimate the value of the integrand. Russian roulette solves this problem, making it possible to also skip tracing rays when f(X)’s value is small but not necessarily 0, while still computing the correct value on average.
To apply Russian roulette, we select some termination probability q. This value can be chosen in almost any manner; for example, it could be based on an estimate of the value of the integrand for the particular sample chosen, increasing as the integrand’s value becomes smaller. With probability q, the estimator is not evaluated for the particular sample, and some constant value c is used in its place (c = 0 is often used). With probability 1 − q, the estimator is still evaluated but is weighted by the factor 1/(1 − q), which effectively compensates for the samples that were skipped.
We have the new estimator
It is easy to see that its expected value is the same as the expected value of the original estimator:
Russian roulette never reduces variance. In fact, unless somehow c = F, it will always increase variance. However, it does improve Monte Carlo efficiency if the probabilities are chosen so that samples that are likely to make a small contribution to the final result are skipped.
While Russian roulette reduces the number of samples, splitting increases the number of samples in some dimensions of multidimensional integrals in order to improve efficiency. As an example, consider an integral of the general form
With the standard importance sampling estimator, we might draw n samples from independent distributions, Xi ∼ px and Yi ∼ py, and compute
Splitting allows us to formalize the idea of taking more than one sample for the integral over B for each sample taken in A. With splitting, we might take m samples Yi,j for each sample Xi, giving the estimator
If it is possible to partially evaluate f(Xi, ·) for each Xi, then we can compute a total of nm samples more efficiently than we had taken nm independent Xi values using Equation (2.18).
For an example from rendering, an integral of the form of Equation (2.17) is evaluated to compute the color of pixels in an image: an integral is taken over the area of the pixel A where at each point in the pixel x, a ray is traced into the scene and the reflected radiance at the intersection point is computed using an integral over the hemisphere (denoted here by B) for which one or more rays are traced. With splitting, we can take multiple samples for each lighting integral, improving efficiency by amortizing the cost of tracing the initial ray from the camera over them.
2.3 SAMPLING USING THE INVERSION METHOD
To evaluate the Monte Carlo estimator in Equation (2.7), it is necessary to be able to draw random samples from a chosen probability distribution. There are a variety of techniques for doing so, but one of the most important for rendering is the inversion method, which maps uniform samples from [0, 1) to a given 1D probability distribution by inverting the distribution’s CDF. (In Section 2.4.2 we will see how this approach can be applied to higher-dimensional functions by considering a sequence of 1D distributions.) When used with well-distributed samples such as those generated by the samplers that are defined in Chapter 8, the inversion method can be particularly effective. Throughout the remainder of the book, we will see the application of the inversion method to generate samples from the distributions defined by BSDFs, light sources, cameras, and scattering media.
Equation (2.2) leads to an algorithm for sampling from a set of discrete probabilities using a uniform random variable. Suppose we have a process with four possible outcomes where the probabilities of each of the four outcomes are given by p1, p2, p3, and p4, with . The corresponding PMF is shown in Figure 2.3.
There is a direct connection between the sums in Equation (2.2) and the definition of the CDF. The discrete CDF is given by
which can be interpreted graphically by stacking the bars of the PMF on top of each other, starting at the left. This idea is shown in Figure 2.4.
The sampling operation of Equation (2.2) can be expressed as finding i such that
Figure 2.3: A PMF for Four Events, Each with a Probability pi. The sum of their probabilities ∑i pi is necessarily 1.
Figure 2.4: A Discrete CDF, Corresponding to the PMF in Figure 2.3. Each column’s height is given by the PMF for the event that it represents plus the sum of the PMFs for the previous events, .
which can be interpreted as inverting the CDF P, and thus, the name of the technique. Continuing the graphical interpretation, this sampling operation can be considered in terms of projecting the events’ probabilities onto the vertical axis where they cover the range [0, 1] and using a random variable ξ to select among them (see Figure 2.5). It should be clear that this draws from the correct distribution—the probability of the uniform sample hitting any particular bar is exactly equal to the height of that bar.
The SampleDiscrete() function implements this algorithm. It takes a not-necessarily normalized set of nonnegative weights, a uniform random sample u, and returns the index of one of the weights with probability proportional to its weight. The sampling operation it performs corresponds to finding i such that
which corresponds to multiplying Equation (2.19) by ∑ wi. (Not requiring a normalized PMF is a convenience for calling code and not much more work in the function’s implementation.) Two optional parameters are provided to return the value of the PMF for the sample as well as a new uniform random sample that is derived from u.
This function is designed for the case where only a single sample needs to be generated from the weights’ distribution; if multiple samples are required, the AliasTable, which will be introduced in Section A.1, should generally be used instead: it generates samples in O(1) time after an O(n) preprocessing step, whereas SampleDiscrete() requires O(n) time for each sample generated.
〈Sampling Inline Functions〉 +≡
int SampleDiscrete(pstd::span<const Float> weights, Float u, Float *pmf,
Float *uRemapped) {
〈Handle empty weights for discrete sampling 71〉
〈Compute sum of weights 71〉
〈Compute rescaled u sample 71〉
〈Find offset in weights corresponding to u′ 71〉
〈Compute PMF and remapped u value, if necessary 71〉
return offset;
}
AliasTable 994
Float 23
The case of weights being empty is handled first so that subsequent code can assume that there is at least one weight.
〈Handle empty weights for discrete sampling〉 ≡ if (weights.empty()) { if (pmf) *pmf = 0; return -1; } |
70 |
The discrete probability of sampling the ith element is given by weights[i] divided by the sum of all weight values. Therefore, the function computes that sum next.
〈Compute sum of weights〉 ≡ Float sumWeights = 0; for (Float w : weights) sumWeights += w; |
70 |
Following Equation (2.20), the uniform sample u is scaled by the sum of the weights to get a value u′ that will be used to sample from them. Even though the provided u value should be in the range [0, 1), it is possible that u * sumWeights will be equal to sumWeights due to floating-point round-off. In that rare case, up is bumped down to the next lower floating-point value so that subsequent code can assume that up < sumWeights.
〈Compute rescaled u′ sample〉 ≡ Float up = u * sumWeights; if (up == sumWeights) up = NextFloatDown(up); |
70 |
We would now like to find the last offset in the weights array i where the random sample up is greater than the sum of weights up to i. Sampling is performed using a linear search from the start of the array, accumulating a sum of weights until the sum would be greater than u′.
〈Find offset in weights corresponding to u′〉 ≡ int offset = 0; Float sum = 0; while (sum + weights[offset] <= up) sum += weights[offset++]; |
70 |
After the while loop terminates, the randomness in the provided sample u has only been used to select an element of the array—a discrete choice. The offset of a sample between the CDF values that bracket it is itself a uniform random value that can easily be remapped to [0, 1). This value is returned to the caller in uRemapped, if requested.
One might ask: why bother? It is not too difficult to generate uniform random variables, so the benefit of providing this option may seem marginal. However, for some of the high-quality sample generation algorithms in Chapter 8, it can be beneficial to reuse samples in this way rather than generating new ones—thus, this option is provided.
Float 23
NextFloatDown() 366
OneMinusEpsilon 470
〈Compute PMF and remapped u value, if necessary〉 ≡ if (pmf) *pmf = weights[offset] / sumWeights; if (uRemapped) *uRemapped = std::min((up - sum) / weights[offset], OneMinusEpsilon); |
70 |
Figure 2.5: To use the inversion method to draw a sample from the distribution described by the PMF in Figure 2.3, a canonical uniform random variable is plotted on the vertical axis. By construction, the horizontal extension of ξ will intersect the box representing the ith outcome with probability pi. If the corresponding event is chosen for a set of random variables ξ, then the resulting distribution of events will be distributed according to the PMF.
In order to generalize this technique to continuous distributions, consider what happens as the number of discrete possibilities approaches infinity. The PMF from Figure 2.3 becomes a PDF, and the CDF from Figure 2.4 becomes its integral. The projection process is still the same, but it has a convenient mathematical interpretation—it represents inverting the CDF and evaluating the inverse at ξ.
More precisely, we can draw a sample Xi from a PDF p(x) with the following steps:
We will illustrate this algorithm with a simple example; see Section A.4 for its application to a number of additional functions.
Sampling a Linear Function
The function f(x) = (1 − x)a + xb defined over [0, 1] linearly interpolates between a at x = 0 and b at x = 1. Here we will assume that a, b ≥ 0; an exercise at the end of the chapter discusses the more general case.
〈Math Inline Functions〉 ≡
Float Lerp(Float x, Float a, Float b) {
return (1 - x) * a + x * b;
}
The function’s integral is , which gives the normalization constant 2/(a + b) to define its PDF,
Float 23
〈Sampling Inline Functions〉 +≡
Float LinearPDF(Float x, Float a, Float b) {
if (x < 0 ‖ x > 1)
return 0;
return 2 * Lerp(x, a, b) / (a + b);
}
Integrating the PDF gives the CDF, which is the quadratic function
Inverting ξ = P(X) gives a sampling recipe
though note that in this form, the case a = b gives an indeterminate result. The more stable formulation
computes the same result and is implemented here.
〈Sampling Inline Functions〉 +≡
Float SampleLinear(Float u, Float a, Float b) {
if (u == 0 && a == 0) return 0;
Float x = u * (a + b) / (a + std::sqrt(Lerp(u, Sqr(a), Sqr(b))));
return std::min(x, OneMinusEpsilon);
}
One detail to note is the std::min call in the return statement, which ensures that the returned value is within the range [0, 1). Although the sampling algorithm generates values in that range given ξ ∈ [0, 1), round-off error may cause the result to be equal to 1. Because some of the code that calls the sampling routines depends on the returned values being in the specified range, the sampling routines must ensure this is so.
In addition to providing functions that sample from a distribution and compute the PDF of a sample, pbrt usually also provides functions that invert sampling operations, returning the random sample ξ that corresponds to a value x. In the 1D case, this is equivalent to evaluating the CDF.
〈Sampling Inline Functions〉 +≡
Float InvertLinearSample(Float x, Float a, Float b) {
return x * (a * (2 - x) + b * x) / (a + b);
}
Float 23
Lerp() 72
OneMinusEpsilon 470
Sqr() 1034
2.4 TRANSFORMING BETWEEN DISTRIBUTIONS
In describing the inversion method, we introduced a technique that generates samples according to some distribution by transforming canonical uniform random variables in a particular manner. Here, we will investigate the more general question of which distribution results when we transform samples from an arbitrary distribution to some other distribution with a function f. Understanding the effect of such transformations is useful for a few reasons, though here we will focus on how they allow us to derive multidimensional sampling algorithms.
Suppose we are given a random variable X drawn from some PDF p(x) with CDF P(x). Given a function f(x) with y = f(x), if we compute Y = f(X), we would like to find the distribution of the new random variable Y. In this case, the function f(x) must be a one-to-one transformation; if multiple values of x mapped to the same y value, then it would be impossible to unambiguously describe the probability density of a particular y value. A direct consequence of f being one-to-one is that its derivative must either be strictly greater than 0 or strictly less than 0, which implies that for a given x,
Pr{Y ≤ f(x)} = Pr{X ≤ x}.
From the definition of the CDF, Equation (2.3), we can see that
Pf(y) = Pf(f(x)) = P(x).
This relationship between CDFs leads directly to the relationship between their PDFs. If we assume that f’s derivative is greater than 0, differentiating gives
and so
In general, f’s derivative is either strictly positive or strictly negative, and the relationship between the densities is
How can we use this formula? Suppose that p(x) = 2x over the domain [0, 1], and let f(x) = sin x. What is the PDF of the random variable Y = f(X)? Because we know that df/dx = cos x,
This procedure may seem backward—usually we have some PDF that we want to sample from, not a given transformation. For example, we might have X drawn from some p(x) and would like to compute Y from some distribution pf(y). What transformation should we use? All we need is for the CDFs to be equal, or Pf(y) = P(x), which immediately gives the transformation
This is a generalization of the inversion method, since if X were uniformly distributed over [0, 1) then P(x) = x, and we have the same procedure as was introduced previously.
2.4.1 TRANSFORMATION IN MULTIPLE DIMENSIONS
In the general d-dimensional case, a similar derivation gives the analogous relationship between different densities. We will not show the derivation here; it follows the same form as the 1D case. Suppose we have a d-dimensional random variable X with density function p(x). Now let Y = T(X), where T is a bijection. In this case, the densities are related by
where |JT | is the absolute value of the determinant of T’s Jacobian matrix, which is
where subscripts index dimensions of T(x) and x.
For a 2D example of the use of Equation (2.21), the polar transformation relates Cartesian (x, y) coordinates to a polar radius and angle,
x = r cos θ
y = r sin θ.
Suppose we draw samples from some density p(r, θ). What is the corresponding density p(x, y)? The Jacobian of this transformation is
and the determinant is r (cos2 θ + sin2 θ) = r. So, p(x, y) = p(r, θ)/r. Of course, this is backward from what we usually want—typically we start with a sampling strategy in Cartesian coordinates and want to transform it to one in polar coordinates. In that case, we would have
In 3D, given the spherical coordinate representation of directions, Equation (3.7), the Jacobian of this transformation has determinant |JT| = r2 sin θ, so the corresponding density function is
This transformation is important since it helps us represent directions as points (x, y, z) on the unit sphere.
2.4.2 SAMPLING WITH MULTIDIMENSIONAL TRANSFORMATIONS
Suppose we have a 2D joint density function p(x, y) that we wish to draw samples (X, Y) from. If the densities are independent, they can be expressed as the product of 1D densities
p(x, y) = px(x) py(y),
and random variables (X, Y) can be found by independently sampling X from px and Y from py. Many useful densities are not separable, however, so we will introduce the theory of how to sample from multidimensional distributions in the general case.
Given a 2D density function, the marginal density function p(x) is obtained by “integrating out” one of the dimensions:
This can be thought of as the density function for X alone. More precisely, it is the average density for a particular x over all possible y values.
If we can draw a sample X ∼ p(x), then—using Equation (2.1)—we can see that in order to sample Y, we need to sample from the conditional probability density, Y ∼ p(y|x), which is given by:
Sampling from higher-dimensional distributions can be performed in a similar fashion, integrating out all but one of the dimensions, sampling that one, and then applying the same technique to the remaining conditional distribution, which has one fewer dimension.
Sampling the Bilinear Function
The bilinear function
interpolates between four values wi at the four corners of [0, 1]2. (w0 is at (0, 0), w1 is at (1, 0), w2 at (0, 1), and w3 at (1, 1).) After integration and normalization, we can find that its PDF is
〈Sampling Inline Functions〉 +≡
Float BilinearPDF(Point2f p, pstd::span<const Float> w) {
if (p.x < 0 ‖ p.x > 1 ‖ p.y < 0 ‖ p.y > 1)
return 0;
if (w[0] + w[1] + w[2] + w[3] == 0)
return 1;
return 4 * ((1 - p[0]) * (1 - p[1]) * w[0] + p[0] * (1 - p[1]) * w[1] +
(1 - p[0]) * p[1] * w[2] + p[0] * p[1] * w[3]) /
(w[0] + w[1] + w[2] + w[3]);
}
The two dimensions of this function are not independent, so the sampling method samples a marginal distribution before sampling the resulting conditional distribution.
〈Sampling Inline Functions〉 +≡
Point2f SampleBilinear(Point2f u, pstd::span<const Float> w) {
Point2f p;
〈Sample y for bilinear marginal distribution 77〉
〈Sample x for bilinear conditional distribution 77〉
return p;
}
We can choose either x or y to be the marginal distribution. If we choose y and integrate out x, we find that
Float 23
Point2f 92
SampleLinear() 73
p(y) performs linear interpolation between two constant values, and so we can use Sample Linear() to sample from the simplified proportional function since it normalizes the associated PDF.
〈Sample y for bilinear marginal distribution〉 ≡ p.y = SampleLinear(u[1], w[0] + w[1], w[2] + w[3]); |
76 |
Applying Equation (2.1) and again canceling out common factors, we have
which can also be sampled in x using SampleLinear().
〈Sample x for bilinear conditional distribution〉 ≡ p.x = SampleLinear(u[0], Lerp(p.y, w[0], w[2]), Lerp(p.y, w[1], w[3])); |
76 |
Because the bilinear sampling routine is based on the composition of two 1D linear sampling operations, it can be inverted by applying the inverses of those two operations in reverse order.
〈Sampling Inline Functions〉 +≡
Point2f InvertBilinearSample(Point2f p, pstd::span<const Float> w) {
return {InvertLinearSample(p.x, Lerp(p.y, w[0], w[2]),
Lerp(p.y, w[1], w[3])),
InvertLinearSample(p.y, w[0] + w[1], w[2] + w[3])};
}
See Section A.5 for further examples of multidimensional sampling algorithms, including techniques for sampling directions on the unit sphere and hemisphere, sampling unit disks, and other useful distributions for rendering.
The Monte Carlo method was introduced soon after the development of the digital computer by Stanislaw Ulam and John von Neumann (Ulam et al. 1947), though it also seems to have been independently invented by Enrico Fermi (Metropolis 1987). An early paper on Monte Carlo was written by Metropolis and Ulam (1949).
Many books have been written on Monte Carlo integration. Hammersley and Handscomb (1964), Spanier and Gelbard (1969), and Kalos and Whitlock (1986) are classic references. More recent books on the topic include those by Sobol′ (1994), Fishman (1996), and Liu (2001). We have also found Owen’s in-progress book (2019) to be an invaluable resource. Motwani and Raghavan (1995) have written an excellent introduction to the broader topic of randomized algorithms.
Most of the functions of interest in rendering are nonnegative; applying importance sampling to negative functions requires special care. A straightforward option is to define a sampling distribution that is proportional to the absolute value of the function. See also Owen and Zhou (2000) for a more effective sampling approach for such functions.
Multiple importance sampling was developed by Veach and Guibas (Veach and Guibas 1995; Veach 1997). Normally, a predetermined number of samples are taken using each sampling technique; see Pajot et al. (2011) and Lu et al. (2013) for approaches to adaptively distributing the samples over strategies in an effort to reduce variance by choosing those that are the best match to the integrand. Grittmann et al. (2019) tracked the variance of each sampling technique and then dynamically adjusted the MIS weights accordingly. The MIS compensation approach was developed by Karlík et al. (2019).
Float 23
InvertLinearSample() 73
Lerp() 72
Point2f 92
SampleLinear() 73
Sbert and collaborators (2016, 2017, 2018) have performed further variance analysis on MIS estimators and have developed improved methods based on allocating samples according to the variance and cost of each technique. Kondapaneni et al. (2019) considered the generalization of MIS to include negative weights and derived optimal estimators in that setting. West et al. (2020) considered the case where a continuum of sampling techniques are available and derived an optimal MIS estimator for that case, and Grittmann et al. (2021) have developed improved MIS estimators when correlation is present among samples (as is the case, for example, with bidirectional light transport algorithms).
Heitz (2020) described an inversion-based sampling method that can be applied when CDF inversion of a 1D function is not possible. It is based on sampling from a second function that approximates the first and then using a second random variable to adjust the sample to match the original function’s distribution. An interesting alternative to manually deriving sampling techniques was described by Anderson et al. (2017), who developed a domain-specific language for sampling where probabilities are automatically computed, given the implementation of a sampling algorithm. They showed the effectiveness of their approach with succinct implementations of a number of tricky sampling techniques.
The numerically stable sampling technique used in SampleLinear() is an application of Muller’s method (1956) due to Heitz (2020).
In applications of Monte Carlo in graphics, the integrand is often a product of factors, where no sampling distribution is available that fits the full product. While multiple importance sampling can give reasonable results in this case, at least minimizing variance from ineffective sampling techniques, sampling the full product is still preferable. Talbot et al. (2005) applied importance resampling to this problem, taking multiple samples from some distribution and then choosing among them with probability proportional to the full integrand. More recently, Hart et al. (2020) presented a simple technique based on warping uniform samples that can be used to approximate product sampling. For more information on this topic, see also the “Further Reading” sections of Chapters 13 and 14, which discuss product sampling approaches in the context of specific light transport algorithms.
Debugging Monte Carlo algorithms can be challenging, since it is their behavior in expectation that determines their correctness: it may be difficult to tell if the program execution for a particular sample is correct. Statistical tests can be an effective approach for checking their correctness. See the papers by Subr and Arvo (2007a) and by Jung et al. (2020) for applicable techniques.
See also the “Further Reading” section in Appendix A, which has information about the sampling algorithms implemented there as well as related approaches.
SampleLinear() 73
➋ 2.1 | Write a program that compares Monte Carlo and one or more alternative numerical integration techniques. Structure this program so that it is easy to replace the particular function being integrated. Verify that the different techniques compute the same result (given a sufficient number of samples for each of them). Modify your program so that it draws samples from distributions other than the uniform distribution for the Monte Carlo estimate, and verify that it still computes the correct result when the correct estimator, Equation (2.7), is used. (Make sure that any alternative distributions you use have nonzero probability of choosing any value of x where f (x) > 0.) |
➊ 2.2 | Write a program that computes unbiased Monte Carlo estimates of the integral of a given function. Compute an estimate of the variance of the estimates by performing a series of trials with successively more samples and computing the mean squared error for each one. Demonstrate numerically that variance decreases at a rate of O(n). |
➋ 2.3 | The algorithm for sampling the linear interpolation function in Section 2.3.2 implicitly assumes that a, b ≥ 0 and that thus f(x) ≥ 0. If f is negative, then the importance sampling PDF should be proportional to |f(x)|. Generalize Sample Linear() and the associated PDF and inversion functions to handle the case where f is always negative as well as the case where it crosses zero due to a and b having different signs. |
SampleLinear() 73
_________________
1 For brevity, we will refer to Monte Carlo integration simply as “Monte Carlo.”
2 Although the theory of Monte Carlo is based on using truly random numbers, in practice a well-written pseudo-random number generator (PRNG) is sufficient. pbrt uses a particularly high-quality PRNG that returns a sequence of pseudo-random values that is effectively as “random” as true random numbers. True random numbers, found by measuring random phenomena like atomic decay or atmospheric noise, are available from sources like www.random.org for those for whom PRNGs are not acceptable.
3 When computing expected values with a uniform distribution, we will drop the subscript p from Ep.
4 As a technical note, it is possible for an estimator with infinite variance to be unbiased but not consistent. Such estimators do not generally come up in rendering, however.
5 See Veach (1997) for a derivation of this result.
6 We will generally assume that f(x) ≥ 0; if it is negative, we might set p(x) ∝ |f(x)|. See the “Further Reading” section for more discussion of this topic.
7 Note that the definition of variance in Equation (2.8) does not preclude computing the variance of a function itself.
8 In general, the lower limit of integration should be −∞, although if p(x) = 0 for x < 0, this equation is equivalent.
03 GEOMETRY AND TRANSFORMATIONS
Almost all nontrivial graphics programs are built on a foundation of geometric classes that represent mathematical constructs like points, vectors, and rays. Because these classes are ubiquitous throughout the system, good abstractions and efficient implementations are critical. This chapter presents the interface to and implementation of pbrt’s geometric foundation. Note that these are not the classes that represent the actual scene geometry (triangles, spheres, etc.); those classes are the topic of Chapter 6.
As is typical in computer graphics, pbrt represents three-dimensional points, vectors, and normal vectors with three coordinate values: x, y, and z. These values are meaningless without a coordinate system that defines the origin of the space and gives three linearly independent vectors that define the x, y, and z axes of the space. Together, the origin and three vectors are called the frame that defines the coordinate system. Given an arbitrary point or direction in 3D, its (x, y, z) coordinate values depend on its relationship to the frame. Figure 3.1 shows an example that illustrates this idea in 2D.
In the general n-dimensional case, a frame’s origin po and its n linearly independent basis vectors define an n-dimensional affine space. All vectors v in the space can be expressed as a linear combination of the basis vectors. Given a vector v and the basis vectors vi, there is a unique set of scalar values si such that
v = s1v1 + … + snvn.
The scalars si are the representation of v with respect to the basis {v1, v2, … , vn} and are the coordinate values that we store with the vector. Similarly, for all points p, there are unique scalars si such that the point can be expressed in terms of the origin po and the basis vectors
p = po + s1v1 + … + snvn.
Thus, although points and vectors are both represented by x, y, and z coordinates in 3D, they are distinct mathematical entities and are not freely interchangeable.
Figure 3.1: In 2D, the (x, y) coordinates of a point p are defined by the relationship of the point to a particular 2D coordinate system. Here, two coordinate systems are shown; the point might have coordinates (3, 3) with respect to the coordinate system with its coordinate axes drawn in solid lines but have coordinates (2, −4) with respect to the coordinate system with dashed axes. In either case, the 2D point p is at the same absolute position in space.
Figure 3.2: (a) In a left-handed coordinate system, the z axis points into the page when the x and y axes are oriented with x pointing to the right and y pointing up. (b) In a right-handed system, the z axis points out of the page.
This definition of points and vectors in terms of coordinate systems reveals a paradox: to define a frame we need a point and a set of vectors, but we can only meaningfully talk about points and vectors with respect to a particular frame. Therefore, in three dimensions we need a standard frame with origin (0, 0, 0) and basis vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). All other frames will be defined with respect to this canonical coordinate system, which we call world space.
3.1.1 COORDINATE SYSTEM HANDEDNESS
There are two different ways that the three coordinate axes can be arranged, as shown in Figure 3.2. Given perpendicular x and y coordinate axes, the z axis can point in one of two directions. These two choices are called left-handed and right-handed. The choice between the two is arbitrary but has a number of implications for how some of the geometric operations throughout the system are implemented. pbrt uses a left-handed coordinate system.
pbrt’s classes that represent two- and three-dimensional points, vectors, and surface normals are all based on general n-tuple classes, whose definitions we will start with. The definitions of these classes as well as the types that inherit from them are defined in the files util/vecmath.h and util/vecmath.cpp under the main pbrt source directory.
Although this and the following few sections define classes that have simple logic in most of their method implementations, they make more use of advanced C++ programming techniques than we generally use in pbrt. Doing so reduces the amount of redundant code needed to implement the point, vector, and normal classes and makes them extensible in ways that will be useful later. If you are not a C++ expert, it is fine to gloss over these details and to focus on understanding the functionality that these classes provide. Alternatively, you could use this as an opportunity to learn more corners of the language.
Both Tuple2 and Tuple3 are template classes. They are templated not just on a type used for storing each coordinate’s value but also on the type of the class that inherits from it to define a specific two- or three-dimensional type. If one has not seen it before, this is a strange construction: normally, inheritance is sufficient, and the base class has no need to know the type of the subclass.1 In this case, having the base class know the child class’s type makes it possible to write generic methods that operate on and return values of the child type, as we will see shortly.
〈Tuple2 Definition〉 ≡
template <template <typename> class Child, typename T>
class Tuple2 {
public:
〈Tuple2 Public Methods〉
〈Tuple2 Public Members 83〉
};
The two-dimensional tuple stores its values as x and y and makes them available as public member variables. The pair of curly braces after each one ensures that the member variables are default initialized; for numeric types, this initializes them to 0.
〈Tuple2 Public Members〉 ≡ T x{}, y{}; |
83 |
We will focus on the Tuple3 implementation for the remainder of this section. Tuple2 is almost entirely the same but with one fewer coordinate.
〈Tuple3 Definition〉 ≡
template <template <typename> class Child, typename T>
class Tuple3 {
public:
〈Tuple3 Public Methods 84〉
〈Tuple3 Public Members 84〉
};
Tuple3 83
By default, the (x, y, z) values are set to zero, although the user of the class can optionally supply values for each of the components. If the user does supply values, the constructor checks that none of them has the floating-point “not a number” (NaN) value using the DCHECK() macro. When compiled in optimized mode, this macro disappears from the compiled code, saving the expense of verifying this case. NaNs almost certainly indicate a bug in the system; if a NaN is generated by some computation, we would like to catch it as soon as possible in order to make isolating its source easier. (See Section 6.8.1 for more discussion of NaN values.)
〈Tuple3 Public Methods〉 ≡ Tuple3(T x, T y, T z) : x(x), y(y), z(z) { DCHECK(!HasNaN()); } |
83 |
Readers who have been exposed to object-oriented design may question our decision to make the tuple component values publicly accessible. Typically, member variables are only accessible inside their class, and external code that wishes to access or modify the contents of a class must do so through a well-defined API that may include selector and mutator functions. Although we are sympathetic to the principle of encapsulation, it is not appropriate here. The purpose of selector and mutator functions is to hide the class’s internal implementation details. In the case of three-dimensional tuples, hiding this basic part of their design gains nothing and adds bulk to code that uses them.
〈Tuple3 Public Members〉 ≡ T x{}, y{}, z{}; |
83 |
The HasNaN() test checks each component individually.
〈Tuple3 Public Methods〉 +≡ bool HasNaN() const { return IsNaN(x) ‖ IsNaN(y) ‖ IsNaN(z); } |
83 |
An alternate implementation of these two tuple classes would be to have a single template class that is also parameterized with an integer number of dimensions and to represent the coordinates with an array of that many T values. While this approach would reduce the total amount of code by eliminating the need for separate two- and three-dimensional tuple types, individual components of the vector could not be accessed as v.x and so forth. We believe that, in this case, a bit more code in the vector implementations is worthwhile in return for more transparent access to components. However, some routines do find it useful to be able to easily loop over the components of vectors; the tuple classes also provide a C++ operator to index into the components so that, given an instance v, v[0] == v.x and so forth.
〈Tuple3 Public Methods〉 +≡ T operator[](int i) const { if (i == 0) return x; if (i == 1) return y; return z; } |
83 |
If the tuple type is non-const, then indexing returns a reference, allowing components of the tuple to be set.
〈Tuple3 Public Methods〉 +≡ T &operator[](int i) { if (i == 0) return x; if (i == 1) return y; return z; } |
83 |
DCHECK() 1066
IsNaN() 363
Tuple3 83
We can now turn to the implementation of arithmetic operations that operate on the values stored in a tuple. Their code is fairly dense. For example, here is the method that adds together two three-tuples of some type (for example, Child might be Vector3, the forthcoming three-dimensional vector type).
〈Tuple3 Public Methods〉 +≡ template <typename U> auto operator+(Child<U> c) const -> Child<decltype(T{} + U{})> { return {x + c.x, y + c.y, z + c.z}; } |
83 |
There are a few things to note in the implementation of operator+. By virtue of being a template method based on another type U, it supports adding two elements of the same Child template type, though they may use different types for storing their components (T and U in the code here). However, because the base type of the method’s parameter is Child, it is only possible to add two values of the same child type using this method. If this method instead took a Tuple3 for the parameter, then it would silently allow addition with any type that inherited from Tuple3, which might not be intended.
There are two interesting things in the declaration of the return type, to the right of the -> operator after the method’s parameter list. First, the base return type is Child; thus, if one adds two Vector3 values, the returned value will be of Vector3 type. This, too, eliminates a class of potential errors: if a Tuple3 was returned, then it would for example be possible to add two Vector3s and assign the result to a Point3, which is nonsensical. Finally, the component type of the returned type is determined based on the type of an expression adding values of types T and U. Thus, this method follows C++’s standard type promotion rules: if a Vector3 that stored integer values is added to one that stores Floats, the result is a Vector3 storing Floats.
In the interests of space, we will not include the other Tuple3 arithmetic operators here, nor will we include the various other utility functions that perform component-wise operations on them. The full list of capabilities provided by Tuple2 and Tuple3 is:
Tuple2 83
Tuple3 83
pbrt provides both 2D and 3D vector classes that are based on the corresponding two- and three-dimensional tuple classes. Both vector types are themselves parameterized by the type of the underlying vector element, thus making it easy to instantiate vectors of both integer and floating-point types.
〈Vector2 Definition〉 ≡
template <typename T>
class Vector2 : public Tuple2<Vector2, T> {
public:
〈Vector2 Public Methods〉
};
Two-dimensional vectors of Floats and integers are widely used, so we will define aliases for those two types.
〈Vector2* Definitions〉 ≡
using Vector2f = Vector2<Float>;
using Vector2i = Vector2<int>;
As with Tuple2, we will not include any further details of Vector2 since it is very similar to Vector3, which we will discuss in more detail.
A Vector3’s tuple of component values gives its representation in terms of the x, y, and z (in 3D) axes of the space it is defined in. The individual components of a 3D vector v will be written vx, vy, and vz.
〈Vector3 Definition〉 ≡
template <typename T>
class Vector3 : public Tuple3<Vector3, T> {
public:
〈Vector3 Public Methods 86
};
We also define type aliases for two commonly used three-dimensional vector types.
〈Vector3* Definitions〉 ≡
using Vector3f = Vector3<Float>;
using Vector3i = Vector3<int>;
Vector3 provides a few constructors, including a default constructor (not shown here) and one that allows specifying each component value directly.
〈Vector3 Public Methods〉 ≡ Vector3(T x, T y, T z) : Tuple3<pbrt::Vector3, T>(x, y, z) {} |
86 |
There is also a constructor that takes a Vector3 with a different element type. It is qualified with explicit so that it is not unintentionally used in automatic type conversions; a cast must be used to signify the intent of the type conversion.
Float 23
Tuple2 83
Tuple3 83
Vector2 86
Vector3 86
〈Vector3 Public Methods〉 +≡
template <typename U>
explicit Vector3(Vector3<U> v)
: Tuple3<pbrt::Vector3, T>(T(v.x), T(v.y), T(v.z)) {}
Figure 3.3: (a) Vector addition: v + w. (b) Notice that the sum v + w forms the diagonal of the parallelogram formed by v and w, which shows the commutativity of vector addition: v + w = w + v.
Figure 3.4: (a) Vector subtraction. (b) If we consider the parallelogram formed by two vectors, the diagonals are given by w − v (dashed line) and −v − w (not shown).
Finally, constructors are provided to convert from the forthcoming Point3 and Normal3 types. Their straightforward implementations are not included here. These, too, are explicit to help ensure that they are only used in situations where the conversion is meaningful.
〈Vector3 Public Methods〉 +≡ template <typename U> explicit Vector3(Point3<U> p); template <typename U> explicit Vector3(Normal3<U> n); |
86 |
Addition and subtraction of vectors is performed component-wise, via methods from Tuple3. The usual geometric interpretation of vector addition and subtraction is shown in Figures 3.3 and 3.4. A vector’s length can be changed via component-wise multiplication or division by a scalar. These capabilities, too, are provided by Tuple3 and so do not require any additional implementation in the Vector3 class.
3.3.1 NORMALIZATION AND VECTOR LENGTH
It is often necessary to normalize a vector—that is, to compute a new vector pointing in the same direction but with unit length. A normalized vector is often called a unit vector. The notation used in this book for normalized vectors is that is the normalized version of v. Before getting to normalization, we will start with computing vectors’ lengths.
The squared length of a vector is given by the sum of the squares of its component values.
〈Vector3 Inline Functions〉 ≡
template <typename T>
T LengthSquared(Vector3<T> v) { return Sqr(v.x) + Sqr(v.y) + Sqr(v.z); }
Normal3 94
Point3 92
Sqr() 1034
Tuple3 83
Vector3 86
Moving on to computing the length of a vector leads us to a quandary: what type should the Length() function return? For example, if the Vector3 stores an integer type, that type is probably not an appropriate return type since the vector’s length will not necessarily be integer-valued. In that case, Float would be a better choice, though we should not standardize on Float for everything, because given a Vector3 of double-precision values, we should return the length as a double as well. Continuing our journey through advanced C++, we turn to a technique known as type traits to solve this dilemma.
First, we define a general TupleLength template class that holds a type definition, type. The default is set here to be Float.
〈TupleLength Definition〉 ≡
template <typename T>
struct TupleLength { using type = Float; };
For Vector3s of doubles, we also provide a template specialization that defines double as the type for length given double for the element type.
〈TupleLength Definition〉 +≡
template <>
struct TupleLength<double> { using type = double; };
Now we can implement Length(), using TupleLength to determine which type to return. Note that the return type cannot be specified before the function declaration is complete since the type T is not known until the function parameters have been parsed. Therefore, the function is declared as auto with the return type specified after its parameter list.
〈Vector3 Inline Functions〉 +≡
template <typename T>
auto Length(Vector3<T> v) -> typename TupleLength<T>::type {
using std::sqrt;
return sqrt(LengthSquared(v));
}
There is one more C++ subtlety in these few lines of code: the reader may wonder, why have a using std::sqrt declaration in the implementation of Length() and then call sqrt(), rather than just calling std::sqrt() directly? That construction is used because we would like to be able to use component types T that do not have overloaded versions of std::sqrt() available to them. For example, we will later make use of Vector3s that store intervals of values for each component using a forthcoming Interval class. With the way the code is written here, if std::sqrt() supports the type T, the std variant of the function is called. If not, then so long as we have defined a function named sqrt() that takes our custom type, that version will be used.
With all of this in hand, the implementation of Normalize() is thankfully now trivial. The use of auto for the return type ensures that if for example Normalize() is called with a vector with integer components, then the returned vector type has Float components according to type conversion from the division operator.
〈Vector3 Inline Functions〉 +≡
template <typename T>
auto Normalize(Vector3<T> v) { return v / Length(v); }
Float 23
Interval 1057
Length() 88
LengthSquared() 87
TupleLength 88
Vector3 86
Two useful operations on vectors are the dot product (also known as the scalar or inner product) and the cross product. For two 3D vectors v and w, their dot product (v · w) is defined as
vxwx + vywy + vzwz,
and the implementation follows directly.
〈Vector3 Inline Functions〉 +≡
template <typename T>
T Dot(Vector3<T> v, Vector3<T> w) {
return v.x * w.x + v.y * w.y + v.z * w.z;
}
A few basic properties directly follow from the definition of the dot product. For example, if u, v, and w are vectors and s is a scalar value, then:
(u · v) = (v · u)
(su · v) = s(u · v)
(u · (v + w)) = (u · v) + (u · w).
The dot product has a simple relationship to the angle between the two vectors:
where θ is the angle between v and w, and ‖v‖ denotes the length of the vector v. It follows from this that (v · w) is zero if and only if v and w are perpendicular, provided that neither v nor w is degenerate—equal to (0, 0, 0). A set of two or more mutually perpendicular vectors is said to be orthogonal. An orthogonal set of unit vectors is called orthonormal.
It follows from Equation (3.1) that if v and w are unit vectors, their dot product is the cosine of the angle between them. As the cosine of the angle between two vectors often needs to be computed for rendering, we will frequently make use of this property.
If we would like to find the angle between two normalized vectors, we could use the standard library’s inverse cosine function, passing it the value of the dot product between the two vectors. However, that approach can suffer from a loss of accuracy when the two vectors are nearly parallel or facing in nearly opposite directions. The following reformulation does more of its computation with values close to the origin where there is more floating-point precision, giving a more accurate result.
〈Vector3 Inline Functions〉 +≡
template <typename T>
Float AngleBetween(Vector3<T> v1, Vector3<T> v2) {
if (Dot(v1, v2) < 0)
return Pi - 2 * SafeASin(Length(v1 + v2) / 2);
else
return 2 * SafeASin(Length(v2 - v1) / 2);
}
AbsDot() 90
Dot() 89
Float 23
Length() 88
Pi 1033
SafeASin() 1035
Vector3 86
We will frequently need to compute the absolute value of the dot product as well. The AbsDot() function does this for us so that a separate call to std::abs() is not necessary in that case.
Figure 3.5: The orthogonal projection of a vector v onto a normalized vector ŵ gives a vector vo that is parallel to ŵ. The difference vector, v − vo, shown here as a dashed line, is perpendicular to ŵ.
〈Vector3 Inline Functions〉 +≡
template <typename T>
T AbsDot(Vector3<T> v1, Vector3<T> v2) { return std::abs(Dot(v1, v2)); }
A useful operation on vectors that is based on the dot product is the Gram–Schmidt process, which transforms a set of non-orthogonal vectors that form a basis into orthogonal vectors that span the same basis. It is based on successive application of the orthogonal projection of a vector v onto a normalized vector ŵ, which is given by (v · ŵ)ŵ (see Figure 3.5). The orthogonal projection can be used to compute a new vector
that is orthogonal to w. An advantage of computing v┴ in this way is that v┴ and w span the same subspace as v and w did.
The GramSchmidt() function implements Equation (3.2); it expects the vector w to already be normalized.
〈Vector3 Inline Functions〉 +≡
template <typename T>
Vector3<T> GramSchmidt(Vector3<T> v, Vector3<T> w) {
return v - Dot(v, w) * w;
}
The cross product is another useful operation for 3D vectors. Given two vectors in 3D, the cross product v×w is a vector that is perpendicular to both of them. Given orthogonal vectors v and w, then v×w is defined to be a vector such that (v, w, v×w) form an orthogonal coordinate system.
The cross product is defined as:
(v×w)x = vywz − vzwy
(v×w)y = vzwx − vxwz
(v×w)z = vxwy − vywx.
A way to remember this is to compute the determinant of the matrix:
Dot() 89
Vector3 86
where i, j, and k represent the axes (1, 0, 0), (0, 1, 0), and (0, 0, 1), respectively. Note that this equation is merely a memory aid and not a rigorous mathematical construction, since the matrix entries are a mix of scalars and vectors.
Figure 3.6: The area of a parallelogram with edges given by vectors v1 and v2 is equal to ‖v1‖ h. From Equation (3.3), the length of the cross product of v1 and v2 is equal to the product of the two vector lengths times the sine of the angle between them—the parallelogram area.
The cross product implementation here uses the DifferenceOfProducts() function that is introduced in Section B.2.9. Given values a, b, c, and d, it computes a*b-c*d in a way that maintains more floating-point accuracy than a direct implementation of that expression would. This concern is not a theoretical one: previous versions of pbrt have resorted to using double precision for the implementation of Cross() so that numerical error would not lead to artifacts in rendered images. Using DifferenceOfProducts() is a better solution since it can operate entirely in single precision while still computing a result with low error.
〈Vector3 Inline Functions〉 +≡
template <typename T>
Vector3<T> Cross(Vector3<T> v, Vector3<T> w) {
return {DifferenceOfProducts(v.y, w.z, v.z, w.y),
DifferenceOfProducts(v.z, w.x, v.x, w.z),
DifferenceOfProducts(v.x, w.y, v.y, w.x)};
}
From the definition of the cross product, we can derive
where θ is the angle between v and w. An important implication of this is that the cross product of two perpendicular unit vectors is itself a unit vector. Note also that the result of the cross product is a degenerate vector if v and w are parallel.
This definition also shows a convenient way to compute the area of a parallelogram (Figure 3.6). If the two edges of the parallelogram are given by vectors v1 and v2, and it has height h, the area is given by ‖v1‖ h. Since h = sin θ‖v2‖, we can use Equation (3.3) to see that the area is ‖v1×v2‖.
3.3.3 COORDINATE SYSTEM FROM A VECTOR
We will sometimes find it useful to construct a local coordinate system given only a single normalized 3D vector. To do so, we must find two additional normalized vectors such that all three vectors are mutually perpendicular.
DifferenceOfProducts() 1044
Vector3 86
Given a vector v, it can be shown that the two vectors
fulfill these conditions. However, computing those properties directly has high error when vz ≈ −1 due to a loss of accuracy when 1/(1 + vz) is calculated. A reformulation of that computation, used in the following implementation, addresses that issue.
〈Vector3 Inline Functions〉 +≡
template <typename T>
void CoordinateSystem(Vector3<T> v1, Vector3<T> *v2, Vector3<T> *v3) {
Float sign = pstd::copysign(Float(1), v1.z);
Float a = -1 / (sign + v1.z);
Float b = v1.x * v1.y * a;
*v2 = Vector3<T>(1 + sign * Sqr(v1.x) * a, sign * b, -sign * v1.x);
*v3 = Vector3<T>(b, sign + Sqr(v1.y) * a, -v1.y);
}
A point is a zero-dimensional location in 2D or 3D space. The Point2 and Point3 classes in pbrt represent points in the obvious way: using x, y, z (in 3D) coordinates with respect to a coordinate system. Although the same representation is used for vectors, the fact that a point represents a position whereas a vector represents a direction leads to a number of important differences in how they are treated. Points are denoted in text by p.
In this section, we will continue the approach of only including implementations of the 3D point methods for the Point3 class here.
〈Point3 Definition〉 ≡
template <typename T>
class Point3 : public Tuple3<Point3, T> {
public:
〈Point3 Public Methods 92〉
};
As with vectors, it is helpful to have shorter type names for commonly used point types.
〈Point3* Definitions〉 ≡
using Point3f = Point3<Float>;
using Point3i = Point3<int>;
It is also useful to be able to convert a point with one element type (e.g., a Point3f) to a point of another one (e.g., Point3i) as well as to be able to convert a point to a vector with a different underlying element type. The following constructor and conversion operator provide these conversions. Both also require an explicit cast, to make it clear in source code when they are being used.
〈Point3 Public Methods〉 ≡ template <typename U> explicit Point3(Point3<U> p) : Tuple3<pbrt::Point3, T>(T(p.x), T(p.y), T(p.z)) {} template <typename U> explicit Point3(Vector3<U> v) : Tuple3<pbrt::Point3, T>(T(v.x), T(v.y), T(v.z)) {} |
92 |
There are certain Point3 methods that either return or take a Vector3. For instance, one can add a vector to a point, offsetting it in the given direction to obtain a new point. Analogous methods, not included in the text, also allow subtracting a vector from a point.
Float 23
Point3 92
Sqr() 1034
Tuple3 83
Vector3 86
Figure 3.7: Obtaining the Vector between Two Points. The vector v = p′ − p is given by the component-wise subtraction of the points p′ and p.
〈Point3 Public Methods〉 +≡ template <typename U> auto operator+(Vector3<U> v) const -> Point3<decltype(T{} + U{})> { return {x + v.x, y + v.y, z + v.z}; } template <typename U> Point3<T> &operator+=(Vector3<U> v) { x += v.x; y += v.y; z += v.z; return *this; } |
92 |
Alternately, one can subtract one point from another, obtaining the vector between them, as shown in Figure 3.7.
〈Point3 Public Methods〉 +≡ template <typename U> auto operator-(Point3<U> p) const -> Vector3<decltype(T{} - U{})> { return {x - p.x, y - p.y, z - p.z}; } |
92 |
The distance between two points can be computed by subtracting them to compute the vector between them and then finding the length of that vector. Note that we can just use auto for the return type and let it be set according to the return type of Length(); there is no need to use the TupleLength type trait to find that type.
〈Point3 Inline Functions〉 ≡
template <typename T>
auto Distance(Point3<T> p1, Point3<T> p2) { return Length(p1 - p2); }
The squared distance between two points can be similarly computed using LengthSquared().
〈Point3 Inline Functions〉 +≡
template <typename T>
auto DistanceSquared(Point3<T> p1, Point3<T> p2) {
return LengthSquared(p1 - p2);
}
Length() 88
LengthSquared() 87
Point3 92
TupleLength 88
Vector3 86
A surface normal (or just normal) is a vector that is perpendicular to a surface at a particular position. It can be defined as the cross product of any two nonparallel vectors that are tangent to the surface at a point. Although normals are superficially similar to vectors, it is important to distinguish between the two of them: because normals are defined in terms of their relationship to a particular surface, they behave differently than vectors in some situations, particularly when applying transformations. (That difference is discussed in Section 3.10.)
〈Normal3 Definition〉 ≡
template <typename T>
class Normal3 : public Tuple3<Normal3, T> {
public:
〈Normal3 Public Methods 94〉
};
〈Normal3 Definition〉 +≡
using Normal3f = Normal3<Float>;
The implementations of Normal3s and Vector3s are very similar. Like vectors, normals are represented by three components x, y, and z; they can be added and subtracted to compute new normals; and they can be scaled and normalized. However, a normal cannot be added to a point, and one cannot take the cross product of two normals. Note that, in an unfortunate turn of terminology, normals are not necessarily normalized.
In addition to the usual constructors (not included here), Normal3 allows conversion from Vector3 values given an explicit typecast, similarly to the other Tuple2- and Tuple3-based classes.
〈Normal3 Public Methods〉 ≡ template <typename U> explicit Normal3<T>(Vector3<U> v) : Tuple3<pbrt::Normal3, T>(T(v.x), T(v.y), T(v.z)) {} |
94 |
The Dot() and AbsDot() functions are also overloaded to compute dot products between the various possible combinations of normals and vectors. This code will not be included in the text here. We also will not include implementations of all the various other Normal3 methods here, since they are similar to those for vectors.
One new operation to implement comes from the fact that it is often necessary to flip a surface normal so it lies in the same hemisphere as a given vector—for example, the surface normal that lies in the same hemisphere as a ray leaving a surface is frequently needed. The FaceForward() utility function encapsulates this small computation. (pbrt also provides variants of this function for the other three combinations of Vector3s and Normal3s as parameters.) Be careful when using the other instances, though: when using the version that takes two Vector3s, for example, ensure that the first parameter is the one that should be returned (possibly flipped) and the second is the one to test against. Reversing the two parameters will give unexpected results.
〈Normal3 Inline Functions〉 ≡
template <typename T>
Normal3<T> FaceForward(Normal3<T> n, Vector3<T> v) {
return (Dot(n, v) < 0.f) ? -n : n;
}
AbsDot() 90
Dot() 89
Float 23
Normal3 94
Point3f 92
Ray 95
Tuple2 83
Tuple3 83
Vector3 86
Vector3f 86
A ray r is a semi-infinite line specified by its origin o and direction d; see Figure 3.8. pbrt represents Rays using a Point3f for the origin and a Vector3f for the direction; there is no need for non-Float-based rays in pbrt. See the files ray.h and ray.cpp in the pbrt source code distribution for the implementation of the Ray class implementation.
Figure 3.8: A ray is a semi-infinite line defined by its origin o and its direction vector d.
〈Ray Definition〉 ≡
class Ray {
public:
〈Ray Public Methods 95〉
〈Ray Public Members 95〉
};
Because we will be referring to these variables often throughout the code, the origin and direction members of a Ray are succinctly named o and d. Note that we again make the data publicly available for convenience.
〈Ray Public Members〉 ≡ Point3f o; Vector3f d; |
95 |
The parametric form of a ray expresses it as a function of a scalar value t, giving the set of points that the ray passes through:
The Ray class overloads the function application operator for rays in order to match the r(t) notation in Equation (3.4).
〈Ray Public Methods〉 ≡ Point3f operator()(Float t) const { return o + d * t; } |
95 |
Given this method, when we need to find the point at a particular position along a ray, we can write code like:
Ray r(Point3f(0, 0, 0), Vector3f(1, 2, 3));
Point3f p = r(1.7);
Each ray also has a time value associated with it. In scenes with animated objects, the rendering system constructs a representation of the scene at the appropriate time for each ray.
〈Ray Public Members〉 +≡ Float time = 0; |
95 |
Float 23
Medium 714
Point3f 92
Ray 95
Ray::d 95
Ray::o 95
Vector3f 86
Each ray also records the medium at its origin. The Medium class, which will be introduced in Section 11.4, encapsulates the (potentially spatially varying) properties of participating media such as a foggy atmosphere, smoke, or scattering liquids like milk. Associating this information with rays makes it possible for other parts of the system to account correctly for the effect of rays passing from one medium to another.
〈Ray Public Members〉 +≡ Medium medium = nullptr; |
95 |
Constructing Rays is straightforward. The default constructor relies on the Point3f and Vector3f constructors to set the origin and direction to (0, 0, 0). Alternately, a particular point and direction can be provided. If an origin and direction are provided, the constructor allows values to be given for the ray’s time and medium.
〈Ray Public Methods〉 +≡ Ray(Point3f o, Vector3f d, Float time = 0.f, Medium medium = nullptr) : o(o), d(d), time(time), medium(medium) {} |
95 |
To be able to perform better antialiasing with the texture functions defined in Chapter 10, pbrt makes use of the RayDifferential class, which is a subclass of Ray that contains additional information about two auxiliary rays. These extra rays represent camera rays offset by one sample in the x and y direction from the main ray on the film plane. By determining the area that these three rays project to on an object being shaded, a Texture can estimate an area to average over for proper antialiasing (Section 10.1).
Because RayDifferential inherits from Ray, geometric interfaces in the system can be written to take const Ray & parameters, so that either a Ray or RayDifferential can be passed to them. Only the routines that need to account for antialiasing and texturing require RayDifferential parameters.
〈RayDifferential Definition〉 ≡
class RayDifferential : public Ray {
public:
〈RayDifferential Public Methods 96〉
〈RayDifferential Public Members 96〉
};
The RayDifferential constructor mirrors the Ray’s.
〈RayDifferential Public Methods〉 ≡ RayDifferential(Point3f o, Vector3f d, Float time = 0.f, Medium medium = nullptr) : Ray(o, d, time, medium) {} |
96 |
In some cases, differential rays may not be available. Routines that take RayDifferential parameters should check the hasDifferentials member variable before accessing the differential rays’ origins or directions.
〈RayDifferential Public Members〉 ≡ bool hasDifferentials = false; Point3f rxOrigin, ryOrigin; Vector3f rxDirection, ryDirection; |
96 |
There is also a constructor to create a RayDifferential from a Ray. As with the previous constructor, the default false value of the hasDifferentials member variable is left as is.
〈RayDifferential Public Methods〉 +≡ explicit RayDifferential(const Ray &ray) : Ray(ray) {} |
96 |
Camera 206
Float 23
Medium 714
Point3f 92
Ray 95
RayDifferential 96
Texture 655
Vector3f 86
Camera implementations in pbrt compute differentials for rays leaving the camera under the assumption that camera rays are spaced one pixel apart. Integrators usually generate multiple camera rays per pixel, in which case the actual distance between samples is lower and the differentials should be updated accordingly; if this factor is not accounted for, then textures in images will generally be too blurry. The ScaleDifferentials() method below takes care of this, given an estimated sample spacing of s. It is called, for example, by the fragment Generate camera ray for current sample in Chapter 1.
〈RayDifferential Public Methods〉 +≡ void ScaleDifferentials(Float s) { rxOrigin = o + (rxOrigin - o) * s; ryOrigin = o + (ryOrigin - o) * s; rxDirection = d + (rxDirection - d) * s; ryDirection = d + (ryDirection - d) * s; } |
96 |
Many parts of the system operate on axis-aligned regions of space. For example, multi-threading in pbrt is implemented by subdividing the image into 2D rectangular tiles that can be processed independently, and the bounding volume hierarchy in Section 7.3 uses 3D boxes to bound geometric primitives in the scene. The Bounds2 and Bounds3 template classes are used to represent the extent of these sorts of regions. Both are parameterized by a type T that is used to represent the coordinates of their extents. As with the earlier vector math types, we will focus here on the 3D variant, Bounds3, since Bounds2 is effectively a subset of it.
〈Bounds2 Definition〉 ≡
template <typename T>
class Bounds2 {
public:
〈Bounds2 Public Methods〉
〈Bounds2 Public Members〉
};
〈Bounds3 Definition〉 ≡
template <typename T>
class Bounds3 {
public:
〈Bounds3 Public Methods 98〉
〈Bounds3 Public Members 98〉
};
We use the same shorthand as before to define names for commonly used bounding types.
〈Bounds[23][fi] Definitions〉 ≡
using Bounds2f = Bounds2<Float>;
using Bounds2i = Bounds2<int>;
using Bounds3f = Bounds3<Float>;
using Bounds3i = Bounds3<int>;
Bounds2 97
Bounds3 97
Float 23
Ray::d 95
Ray::o 95
RayDifferential::rxDirection 96
RayDifferential::rxOrigin 96
RayDifferential::ryDirection 96
RayDifferential::ryOrigin 96
There are a few possible representations for these sorts of bounding boxes; pbrt uses axis-aligned bounding boxes (AABBs), where the box edges are mutually perpendicular and aligned with the coordinate system axes. Another possible choice is oriented bounding boxes (OBBs), where the box edges on different sides are still perpendicular to each other but not necessarily coordinate-system aligned. A 3D AABB can be described by one of its vertices and three lengths, each representing the distance spanned along the x, y, and z coordinate axes. Alternatively, two opposite vertices of the box can describe it. We chose the two-point representation for pbrt’s Bounds2 and Bounds3 classes; they store the positions of the vertex with minimum coordinate values and of the one with maximum coordinate values. A 2D illustration of a bounding box and its representation is shown in Figure 3.9.
Figure 3.9: An Axis-Aligned Bounding Box. The Bounds2 and Bounds3 classes store only the coordinates of the minimum and maximum points of the box; the other box corners are implicit in this representation.
〈Bounds3 Public Members〉 ≡ Point3<T> pMin, pMax; |
97 |
The default constructors create an empty box by setting the extent to an invalid configuration, which violates the invariant that pMin.x <= pMax.x (and similarly for the other dimensions). By initializing two corner points with the largest and smallest representable number, any operations involving an empty box (e.g., Union()) will yield the correct result.
〈Bounds3 Public Methods〉 ≡ Bounds3() { T minNum = std::numeric_limits<T>::lowest(); T maxNum = std::numeric_limits<T>::max(); pMin = Point3<T>(maxNum, maxNum, maxNum); pMax = Point3<T>(minNum, minNum, minNum); } |
97 |
It is also useful to be able to initialize bounds that enclose just a single point:
〈Bounds3 Public Methods〉 +≡ explicit Bounds3(Point3<T> p) : pMin(p), pMax(p) {} |
97 |
If the caller passes two corner points (p1 and p2) to define the box, the constructor needs to find their component-wise minimum and maximum values since it is not necessarily the case that p1.x <= p2.x, and so on.
〈Bounds3 Public Methods〉 +≡ Bounds3(Point3<T> p1, Point3<T> p2) : pMin(Min(p1, p2)), pMax(Max(p1, p2)) {} |
97 |
It can be useful to use array indexing to select between the two points at the corners of the box. Assertions in the debug build, not shown here, check that the provided index is either 0 or 1.
〈Bounds3 Public Methods〉 +≡ Point3<T> operator[](int i) const { return (i == 0) ? pMin : pMax; } Point3<T> &operator[](int i) { return (i == 0) ? pMin : pMax; } |
97 |
Bounds2 97
Bounds3 97
Bounds3::pMax 98
Bounds3::pMin 98
Point3 92
Tuple3::Max() 85
Tuple3::Min() 85
The Corner() method returns the coordinates of one of the eight corners of the bounding box. Its logic calls the operator[] method with a zero or one value for each dimension that is based on one of the low three bits of corner and then extracts the corresponding component.
It is worthwhile to verify that this method returns the positions of all eight corners when passed values from 0 to 7 if that is not immediately evident.
〈Bounds3 Public Methods〉 +≡ Point3<T> Corner(int corner) const { return Point3<T>((*this)[(corner & 1)].x, (*this)[(corner & 2) ? 1 : 0].y, (*this)[(corner & 4) ? 1 : 0].z); } |
97 |
Given a bounding box and a point, the Union() function returns a new bounding box that encompasses that point as well as the original bounds.
〈Bounds3 Inline Functions〉 ≡
template <typename T>
Bounds3<T> Union(const Bounds3<T> &b, Point3<T> p) {
Bounds3<T> ret;
ret.pMin = Min(b.pMin, p);
ret.pMax = Max(b.pMax, p);
return ret;
}
One subtlety that applies to this and some of the following functions is that it is important that the pMin and pMax members of ret be set directly here, rather than passing the values returned by Min() and Max() to the Bounds3 constructor. The detail stems from the fact that if the provided bounds are both degenerate, the returned bounds should be degenerate as well. If a degenerate extent is passed to the constructor, then it will sort the coordinate values, which in turn leads to what is essentially an infinite bound.
It is similarly possible to construct a new box that bounds the space encompassed by two other bounding boxes. The definition of this function is similar to the earlier Union() method that takes a Point3f; the difference is that the pMin and pMax of the second box are used for the Min() and Max() tests, respectively.
〈Bounds3 Inline Functions〉 +≡
template <typename T>
Bounds3<T> Union(const Bounds3<T> &b1, const Bounds3<T> &b2) {
Bounds3<T> ret;
ret.pMin = Min(b1.pMin, b2.pMin);
ret.pMax = Max(b1.pMax, b2.pMax); return ret;
}
The intersection of two bounding boxes can be found by computing the maximum of their two respective minimum coordinates and the minimum of their maximum coordinates. (See Figure 3.10.)
〈Bounds3 Inline Functions〉 +≡
template <typename T>
Bounds3<T> Intersect(const Bounds3<T> &b1, const Bounds3<T> &b2) {
Bounds3<T> b;
b.pMin = Max(b1.pMin, b2.pMin);
b.pMax = Min(b1.pMax, b2.pMax);
return b;
}
Bounds3 97
Bounds3::pMax 98
Bounds3::pMin 98
Point3 92
Point3f 92
Tuple3::Max() 85
Tuple3::Min() 85
Figure 3.10: Intersection of Two Bounding Boxes. Given two bounding boxes with pMin and pMax points denoted by open circles, the bounding box of their area of intersection (shaded region) has a minimum point (lower left filled circle) with coordinates given by the maximum of the coordinates of the minimum points of the two boxes in each dimension. Similarly, its maximum point (upper right filled circle) is given by the minimums of the boxes’ maximum coordinates.
We can also determine if two bounding boxes overlap by seeing if their extents overlap in all of x, y, and z:
〈Bounds3 Inline Functions〉 +≡
template <typename T>
bool Overlaps(const Bounds3<T> &b1, const Bounds3<T> &b2) {
bool x = (b1.pMax.x >= b2.pMin.x) && (b1.pMin.x <= b2.pMax.x);
bool y = (b1.pMax.y >= b2.pMin.y) && (b1.pMin.y <= b2.pMax.y);
bool z = (b1.pMax.z >= b2.pMin.z) && (b1.pMin.z <= b2.pMax.z);
return (x && y && z);
}
Three 1D containment tests determine if a given point is inside a bounding box.
〈Bounds3 Inline Functions〉 +≡
template <typename T>
bool Inside(Point3<T> p, const Bounds3<T> &b) {
return (p.x >= b.pMin.x && p.x <= b.pMax.x &&
p.y >= b.pMin.y && p.y <= b.pMax.y &&
p.z >= b.pMin.z && p.z <= b.pMax.z);
}
The InsideExclusive() variant of Inside() does not consider points on the upper boundary to be inside the bounds. It is mostly useful with integer-typed bounds.
〈Bounds3 Inline Functions〉 +≡
template <typename T>
bool InsideExclusive(Point3<T> p, const Bounds3<T> &b) {
return (p.x >= b.pMin.x && p.x < b.pMax.x &&
p.y >= b.pMin.y && p.y < b.pMax.y &&
p.z >= b.pMin.z && p.z < b.pMax.z);
}
Bounds3 97
Bounds3::pMax 98
Bounds3::pMin 98
Point3 92
DistanceSquared() returns the squared distance from a point to a bounding box or zero if the point is inside it. The geometric setting of the computation is shown in Figure 3.11. After the distance from the point to the box is computed in each dimension, the squared distance is found by summing the squares of each of the 1D distances.
Figure 3.11: Computing the Squared Distance from a Point to an Axis-Aligned Bounding Box. We first find the distance from the point to the box in each dimension. Here, the point represented by an empty circle on the upper left is above to the left of the box, so its x and y distances are respectively pMin.x - p.x and pMin.y - p.y. The other point represented by an empty circle is to the right of the box but overlaps its extent in the y dimension, giving it respective distances of p.x - pMax.x and zero. The logic in Bounds3::DistanceSquared() computes these distances by finding the maximum of zero and the distances to the minimum and maximum points in each dimension.
〈Bounds3 Inline Functions〉 +≡
template <typename T, typename U>
auto DistanceSquared(Point3<T> p, const Bounds3<U> &b) {
using TDist = decltype(T{} - U{});
TDist dx = std::max<TDist>({0, b.pMin.x - p.x, p.x - b.pMax.x});
TDist dy = std::max<TDist>({0, b.pMin.y - p.y, p.y - b.pMax.y});
TDist dz = std::max<TDist>({0, b.pMin.z - p.z, p.z - b.pMax.z});
return Sqr(dx) + Sqr(dy) + Sqr(dz);
}
It is easy to compute the distance from a point to a bounding box, though some indirection is needed to be able to determine the correct return type using TupleLength.
〈Bounds3 Inline Functions〉 +≡
template <typename T, typename U>
auto Distance(Point3<T> p, const Bounds3<U> &b) {
auto dist2 = DistanceSquared(p, b);
using TDist = typename TupleLength<decltype(dist2)>::type;
return std::sqrt(TDist(dist2));
}
The Expand() function pads the bounding box by a constant factor in all dimensions.
〈Bounds3 Inline Functions〉 +≡
template <typename T, typename U>
Bounds3<T> Expand(const Bounds3<T> &b, U delta) {
Bounds3<T> ret;
ret.pMin = b.pMin - Vector3<T>(delta, delta, delta);
ret.pMax = b.pMax + Vector3<T>(delta, delta, delta);
return ret;
}
Bounds3 97
Bounds3::DistanceSquared() 101
Bounds3::pMax 98
Bounds3::pMin 98
Point3 92
Sqr() 1034
TupleLength 88
Vector3 86
Diagonal() returns the vector along the box diagonal from the minimum point to the maximum point.
〈Bounds3 Public Methods〉 +≡ Vector3<T> Diagonal() const { return pMax - pMin; } |
97 |
Methods for computing the surface area of the six faces of the box and the volume inside of it are also useful. (This is a place where Bounds2 and Bounds3 diverge: these methods are not available in Bounds2, though it does have an Area() method.)
〈Bounds3 Public Methods〉 +≡ T SurfaceArea() const { Vector3<T> d = Diagonal(); return 2 * (d.x * d.y + d.x * d.z + d.y * d.z); } |
97 |
〈Bounds3 Public Methods〉 +≡ T Volume() const { Vector3<T> d = Diagonal(); return d.x * d.y * d.z; } |
97 |
The Bounds3::MaxDimension() method returns the index of which of the three axes is longest. This is useful, for example, when deciding which axis to subdivide when building some of the ray-intersection acceleration structures.
〈Bounds3 Public Methods〉 +≡ int MaxDimension() const { Vector3<T> d = Diagonal(); if (d.x > d.y && d.x > d.z) return 0; else if (d.y > d.z) return 1; else return 2; } |
97 |
Lerp() linearly interpolates between the corners of the box by the given amount in each dimension.
〈Bounds3 Public Methods〉 +≡ Point3f Lerp(Point3f t) const { return Point3f(pbrt::Lerp(t.x, pMin.x, pMax.x), pbrt::Lerp(t.y, pMin.y, pMax.y), pbrt::Lerp(t.z, pMin.z, pMax.z)); } |
97 |
Offset() is effectively the inverse of Lerp(). It returns the continuous position of a point relative to the corners of the box, where a point at the minimum corner has offset (0, 0, 0), a point at the maximum corner has offset (1, 1, 1), and so forth.
〈Bounds3 Public Methods〉 +≡ Vector3f Offset(Point3f p) const { Vector3f o = p - pMin; if (pMax.x > pMin.x) o.x /= pMax.x - pMin.x; if (pMax.y > pMin.y) o.y /= pMax.y - pMin.y; if (pMax.z > pMin.z) o.z /= pMax.z - pMin.z; return o; } |
97 |
Bounds3 97
Bounds3::Diagonal() 101
Bounds3::MaxDimension() 102
Bounds3::pMax 98
Bounds3::pMin 98
Lerp() 72
Point3f 92
Vector3 86
Vector3f 86
Bounds3 also provides a method that returns the center and radius of a sphere that bounds the bounding box. In general, this may give a far looser fit than a sphere that bounded the original contents of the Bounds3 directly, although for some geometric operations it is easier to work with a sphere than a box, in which case the worse fit may be an acceptable trade-off.
〈Bounds3 Public Methods〉 +≡ void BoundingSphere(Point3<T> *center, Float *radius) const { *center = (pMin + pMax) / 2; *radius = Inside(*center, *this) ? Distance(*center, pMax) : 0; } |
97 |
Straightforward methods test for empty and degenerate bounding boxes. Note that “empty” means that a bounding box has zero volume but does not necessarily imply that it has zero surface area.
〈Bounds3 Public Methods〉 +≡
bool IsEmpty() const {
return pMin.x >= pMax.x ‖ pMin.y >= pMax.y ‖ pMin.z >= pMax.z;
}
bool IsDegenerate() const {
return pMin.x > pMax.x ‖ pMin.y > pMax.y ‖ pMin.z > pMax.z;
}
Finally, for integer bounds, there is an iterator class that fulfills the requirements of a C++ forward iterator (i.e., it can only be advanced). The details are slightly tedious and not particularly interesting, so the code is not included in the book. Having this definition makes it possible to write code using range-based for loops to iterate over integer coordinates in a bounding box:
Bounds2i b = …;
for (Point2i p : b) {
⋮
}
As implemented, the iteration goes up to but does not visit points equal to the maximum extent in each dimension.
Geometry on the unit sphere is also frequently useful in rendering. 3D unit direction vectors can equivalently be represented as points on the unit sphere, and sets of directions can be represented as areas on the unit sphere. Useful operations such as bounding a set of directions can often be cleanly expressed as bounds on the unit sphere. We will therefore introduce some useful principles of spherical geometry and related classes and functions in this section.
In 2D, the planar angle is the total angle subtended by some object with respect to some position (Figure 3.12). Consider the unit circle around the point p; if we project the shaded object onto that circle, some length of the circle s will be covered by its projection. The arc length of s (which is the same as the angle θ) is the angle subtended by the object. Planar angles are measured in radians and the entire unit circle covers 2π radians.
The solid angle extends the 2D unit circle to a 3D unit sphere (Figure 3.13). The total area s is the solid angle subtended by the object. Solid angles are measured in steradians (sr). The entire sphere subtends a solid angle of 4π sr, and a hemisphere subtends 2π sr.
Bounds3::Inside() 100
Bounds3::pMax 98
Bounds3::pMin 98
Distance() 93
Float 23
Point3 92
Figure 3.12: Planar Angle. The planar angle of an object as seen from a point p is equal to the angle it subtends as seen from p or, equivalently, as the length of the arc s on the unit sphere.
Figure 3.13: Solid Angle. The solid angle s subtended by a 3D object is computed by projecting the object onto the unit sphere and measuring the area of its projection.
By providing a way to measure area on the unit sphere (and thus over the unit directions), the solid angle also provides the foundation for a measure for integrating spherical functions; the differential solid angle dω corresponds to the differential area measure on the unit sphere.
We will sometimes find it useful to consider the set of directions from a point to the surface of a polygon. (Doing so can be useful, for example, when computing the illumination arriving at a point from an emissive polygon.) If a regular planar polygon is projected onto the unit sphere, it forms a spherical polygon.
A vertex of a spherical polygon can be found by normalizing the vector from the center of the sphere to the corresponding vertex of the original polygon. Each edge of a spherical polygon is given by the intersection of the unit sphere with the plane that goes through the sphere’s center and the corresponding two vertices of the polygon. The result is a great circle on the sphere that is the shortest distance between the two vertices on the surface of the sphere (Figure 3.14).
Figure 3.14: A spherical polygon corresponds to the projection of a polygon onto the unit sphere. Its vertices correspond to the unit vectors to the original polygon’s vertices and its edges are defined by the intersection of the sphere and the planes that go through the sphere’s center and two vertices of the polygon.
Figure 3.15: A Spherical Triangle. Each vertex’s angle is labeled with the Greek letter corresponding to the letter used for its vertex.
The angle at each vertex is given by the angle between the planes corresponding to the two edges that meet at the vertex (Figure 3.15). (The angle between two planes is termed their dihedral angle.) We will label the angle at each vertex with the Greek letter that corresponds to its label (α for the vertex a and so forth). Unlike planar triangles, the three angles of a spherical triangle do not sum to π radians; rather, their sum is π + A, where A is the spherical triangle’s area. Given the angles α, β, and γ, it follows that the area of a spherical triangle can be computed using Girard’s theorem, which says that a triangle’s surface area A on the unit sphere is given by the “excess angle”
Direct implementation of Equation (3.5) requires multiple calls to expensive inverse trigonometric functions, and its computation can be prone to error due to floating-point cancellation. A more efficient and accurate approach is to apply the relationship
which can be derived from Equation (3.5) using spherical trigonometric identities. That approach is used in SphericalTriangleArea(), which takes three vectors on the unit sphere corresponding to the spherical triangle’s vertices.
〈Spherical Geometry Inline Functions〉 ≡
Float SphericalTriangleArea(Vector3f a, Vector3f b, Vector3f c) {
return std::abs(2 * std::atan2(Dot(a, Cross(b, c)),
1 + Dot(a, b) + Dot(a, c) + Dot(b, c)));
}
The area of a quadrilateral projected onto the unit sphere is given by α + β + γ + δ − 2π, where α, β, γ, and δ are its interior angles. This value is computed by SphericalQuadArea(), which takes the vertex positions on the unit sphere. Its implementation is very similar to SphericalTriangleArea(), so it is not included here.
〈Spherical Geometry Inline Functions〉 +≡
Float SphericalQuadArea(Vector3f a, Vector3f b, Vector3f c, Vector3f d);
3.8.3 SPHERICAL PARAMETERIZATIONS
The 3D Cartesian coordinates of a point on the unit sphere are not always the most convenient representation of a direction. For example, if we are tabulating a function over the unit sphere, a 2D parameterization that takes advantage of the fact that the sphere’s surface is two-dimensional is preferable.
There are a variety of mappings between 2D and the sphere. Developing such mappings that fulfill various goals has been an important part of map making since its beginnings. It can be shown that any mapping from the plane to the sphere introduces some form of distortion; the task then is to choose a mapping that best fulfills the requirements for a particular application. pbrt thus uses three different spherical parameterizations, each with different advantages and disadvantages.
Spherical Coordinates
Spherical coordinates (θ, ϕ) are a well-known parameterization of the sphere. For a general sphere of radius r, they are related to Cartesian coordinates by
(See Figure 3.16.)
For convenience, we will define a SphericalDirection() function that converts a θ and ϕ pair into a unit (x, y, z) vector, applying these equations directly. Notice that the function is given the sine and cosine of θ, rather than θ itself. This is because the sine and cosine of θ are often already available to the caller. This is not normally the case for ϕ, however, so ϕ is passed in as is.
〈Spherical Geometry Inline Functions〉 +≡
Vector3f SphericalDirection(Float sinTheta, Float cosTheta, Float phi) {
return Vector3f(Clamp(sinTheta, -1, 1) * std::cos(phi),
Clamp(sinTheta, -1, 1) * std::sin(phi),
Clamp(cosTheta, -1, 1));
}
Cross() 91
Dot() 89
Float 23
SphericalTriangleArea() 106
Vector3f 86
Figure 3.16: A direction vector can be written in terms of spherical coordinates (θ, ϕ) if the x, y, and z basis vectors are given as well. The spherical angle formulae make it easy to convert between the two representations.
The conversion of a direction (x, y, z) to spherical coordinates can be found by
The corresponding functions follow. Note that SphericalTheta() assumes that the vector v has been normalized before being passed in; using SafeACos() in place of std::acos() avoids errors if |v.z| is slightly greater than 1 due to floating-point round-off error.
〈Spherical Geometry Inline Functions〉 +≡
Float SphericalTheta(Vector3f v) { return SafeACos(v.z); }
SphericalPhi() returns an angle in [0, 2π], which sometimes requires an adjustment to the value returned by std::atan2().
〈Spherical Geometry Inline Functions〉 +≡
Float SphericalPhi(Vector3f v) {
Float p = std::atan2(v.y, v.x);
return (p < 0) ? (p + 2 * Pi) : p;
}
Given a direction vector ω, it is easy to compute quantities like the cosine of the angle θ:
cos θ = ((0, 0, 1) · ω) = ωz.
This is a much more efficient computation than it would have been to compute ω’s θ value using first an expensive inverse trigonometric function to compute θ and then another expensive function to compute its cosine. The following functions compute this cosine and a few useful variations.
Float 23
Pi 1033
SafeACos() 1035
SphericalTheta() 107
Sqr() 1034
Vector3f 86
〈Spherical Geometry Inline Functions〉 +≡
Float CosTheta(Vector3f w) { return w.z; }
Float Cos2Theta(Vector3f w) { return Sqr(w.z); }
Float AbsCosTheta(Vector3f w) { return std::abs(w.z); }
The value of sin2 θ can be efficiently computed using the trigonometric identity sin2 θ + cos2 θ = 1, though we need to be careful to avoid returning a negative value in the rare case that 1 - Cos2Theta(w) is less than zero due to floating-point round-off error.
Figure 3.17: The values of sin ϕ and cos ϕ can be computed using the circular coordinate equations x = r cos ϕ and y = r sin ϕ, where r, the length of the dashed line, is equal to sin θ.
〈Spherical Geometry Inline Functions〉 +≡
Float Sin2Theta(Vector3f w) { return std::max<Float>(0, 1 - Cos2Theta(w)); } Float SinTheta(Vector3f w) {
return std::sqrt(Sin2Theta(w)); }
The tangent of the angle θ can be computed via the identity tan θ = sin θ/cos θ.
〈Spherical Geometry Inline Functions〉 +≡
Float TanTheta(Vector3f w) { return SinTheta(w) / CosTheta(w); }
Float Tan2Theta(Vector3f w) { return Sin2Theta(w) / Cos2Theta(w); }
The sine and cosine of the ϕ angle can also be easily found from (x, y, z) coordinates without using inverse trigonometric functions (Figure 3.17). In the z = 0 plane, the vector ω has coordinates (x, y), which are given by r cos ϕ and r sin ϕ, respectively. The radius r is sin θ, so
〈Spherical Geometry Inline Functions〉 +≡
Float CosPhi(Vector3f w) {
Float sinTheta = SinTheta(w);
return (sinTheta == 0) ? 1 : Clamp(w.x / sinTheta, -1, 1);
}
Float SinPhi(Vector3f w) {
Float sinTheta = SinTheta(w);
return (sinTheta == 0) ? 0 : Clamp(w.y / sinTheta, -1, 1);
}
Finally, the cosine of the angle Δϕ between two vectors’ ϕ values can be found by zeroing their z coordinates to get 2D vectors in the z = 0 plane and then normalizing them. The dot product of these two vectors gives the cosine of the angle between them. The implementation below rearranges the terms a bit for efficiency so that only a single square root operation needs to be performed.
Clamp() 1033
Cos2Theta() 107
CosTheta() 107
Float 23
Sin2Theta() 108
SinTheta() 108
Vector3f 86
〈Spherical Geometry Inline Functions〉 +≡
Float CosDPhi(Vector3f wa, Vector3f wb) {
Float waxy = Sqr(wa.x) + Sqr(wa.y), wbxy = Sqr(wb.x) + Sqr(wb.y);
if (waxy == 0 ‖ wbxy == 0) return 1;
return Clamp((wa.x * wb.x + wa.y * wb.y) / std::sqrt(waxy * wbxy),
-1, 1);
}
Parameterizing the sphere with spherical coordinates corresponds to the equirectangular mapping of the sphere. It is not a particularly good parameterization for representing regularly sampled data on the sphere due to substantial distortion at the sphere’s poles.
Octahedral Encoding
While Vector3f is a convenient representation for computation using unit vectors, it does not use storage efficiently: not only does it use 12 bytes of memory (assuming 4-byte Floats), but it is capable of representing 3D direction vectors of arbitrary length. Normalized vectors are a small subset of all the possible Vector3fs, however, which means that the storage represented by those 12 bytes is not well allocated for them. When many normalized vectors need to be stored in memory, a more space-efficient representation can be worthwhile.
Spherical coordinates could be used for this task. Doing so would reduce the storage required to two Floats, though with the disadvantage that relatively expensive trigonometric and inverse trigonometric functions would be required to convert to and from Vector3s. Further, spherical coordinates provide more precision near the poles and less near the equator; a more equal distribution of precision across all unit vectors is preferable. (Due to the way that floating-point numbers are represented, Vector3f suffers from providing different precision in different parts of the unit sphere as well.)
OctahedralVector provides a compact representation for unit vectors with an even distribution of precision and efficient encoding and decoding routines. Our implementation uses just 4 bytes of memory for each unit vector; all the possible values of those 4 bytes correspond to a valid unit vector. Its representation is not suitable for computation, but it is easy to convert between it and Vector3f, which makes it an appealing option for in-memory storage of normalized vectors.
〈OctahedralVector Definition〉 ≡
class OctahedralVector {
public:
〈OctahedralVector Public Methods 110〉
private:
〈OctahedralVector Private Methods 110〉
〈OctahedralVector Private Members 110〉
};
As indicated by its name, this unit vector is based on an octahedral mapping of the unit sphere that is illustrated in Figure 3.18.
The algorithm to convert a unit vector to this representation is surprisingly simple. The first step is to project the vector onto the faces of the 3D octahedron; this can be done by dividing the vector components by the vector’s L1 norm, |vx| + |vy| + |vz|. For points in the upper hemisphere (i.e., with vz ≥ 0), projection down to the z = 0 plane then just requires taking the x and y components directly.
Clamp() 1033
Float 23
OctahedralVector 109
Sqr() 1034
Vector3 86
Vector3f 86
Figure 3.18: The OctahedralVector’s parameterization of the unit sphere can be understood by first considering (a) an octahedron inscribed in the sphere. Its 2D parameterization is then defined by (b) flattening the top pyramid into the z = 0 plane and (c) unwrapping the bottom half and projecting its triangles onto the same plane. (d) The result allows a simple [−1, 1]2 parameterization. (Figure after Figure 2 in Meyer et al. (2010).)
〈OctahedralVector Public Methods〉 ≡ OctahedralVector(Vector3f v) { v /= std::abs(v.x) + std::abs(v.y) + std::abs(v.z); if (v.z >= 0) { x = Encode(v.x); y = Encode(v.y); } else { 〈Encode octahedral vector with z < 0 110〉 } } |
109 |
For directions in the lower hemisphere, the reprojection to the appropriate point in [−1, 1]2 is slightly more complex, though it can be expressed without any conditional control flow with a bit of care. (Here is another concise fragment of code that is worth understanding; consider in comparison code based on if statements that handled unwrapping the four triangles independently.)
〈Encode octahedral vector with z < 0〉 ≡ x = Encode((1 - std::abs(v.y)) * Sign(v.x)); y = Encode((1 - std::abs(v.x)) * Sign(v.y)); |
110 |
The helper function OctahedralVector::Sign() uses the standard math library function std::copysign() to return ±1 according to the sign of v (positive/negative zero are treated like ordinary numbers).
〈OctahedralVector Private Methods〉 ≡ static Float Sign(Float v) { return std::copysign(1.f, v); } |
109 |
The 2D parameterization in Figure 3.18(d) is then represented using a 16-bit value for each coordinate that quantizes the range [−1, 1]with 216 steps.
〈OctahedralVector Private Members〉 ≡ uint16_t x, y; |
109 |
Encode() performs the encoding from a value in [−1, 1]to the integer encoding.
〈OctahedralVector Private Methods〉 +≡
static uint16_t Encode(Float f) {
return pstd::round(Clamp((f + 1) / 2, 0, 1) * 65535.f);
}
Clamp() 1033
Float 23
OctahedralVector 109
OctahedralVector::Encode() 110
OctahedralVector::Sign() 110
Vector3f 86
The mapping back to a Vector3f follows the same steps in reverse. For directions in the upper hemisphere, the z value on the octahedron face is easily found. Normalizing that vector then gives the corresponding unit vector.
〈OctahedralVector Public Methods〉 +≡ explicit operator Vector3f() const { Vector3f v; v.x = -1 + 2 * (x / 65535.f); v.y = -1 + 2 * (y / 65535.f); v.z = 1 - (std::abs(v.x) + std::abs(v.y)); 〈Reparameterize directions in the z < 0 portion of the octahedron 111〉 return Normalize(v); } |
109 |
For directions in the lower hemisphere, the inverse of the mapping implemented in the 〈Encode octahedral vector with z < 0〉 fragment must be performed before the direction is normalized.
〈Reparameterize directions in the z < 0 portion of the octahedron〉 ≡ if (v.z < 0) { Float xo = v.x; v.x = (1 - std::abs(v.y)) * Sign(xo); v.y = (1 - std::abs(xo)) * Sign(v.y); } |
111 |
Equal-Area Mapping
The third spherical parameterization used in pbrt is carefully designed to preserve area: any area on the surface of the sphere maps to a proportional area in the parametric domain. This representation is a good choice for tabulating functions on the sphere, as it is continuous, has reasonably low distortion, and all values stored represent the same solid angle. It combines the octahedral mapping used in the OctahedralVector class with a variant of the square-to-disk mapping from Section A.5.1, which maps the unit square to the hemisphere in a way that preserves area. The mapping splits the unit square into four sectors, each of which is mapped to a sector of the hemisphere (see Figure 3.19).
Given (u, v) ∈ [−1, 1]2; then in the first sector where u ≥ 0 and u − |v| ≥ 0, defining the polar coordinates of a point on the unit disk by
gives an area-preserving mapping with ϕ ∈ [−π/4, π/4]. Similar mappings can be found for the other three sectors.
Given (r, ϕ), the corresponding point on the positive hemisphere is then given by
This mapping is also area-preserving.
This mapping can be extended to the entire sphere using the same octahedral mapping that was used for the OctahedralVector. There are then three steps:
Float 23
Normalize() 88
OctahedralVector 109
OctahedralVector::Sign() 110
Vector3f 86
Figure 3.19: The uniform hemispherical mapping (a) first transforms the unit square to the unit disk so that the four shaded sectors of the square are mapped to the corresponding shaded sectors of the disk. (b) Points on the disk are then mapped to the hemisphere in a manner that preserves relative area.
The following implementation of this approach goes through some care to be branch free: no matter what the input value, there is a single path of control flow through the function. When possible, this characteristic is often helpful for performance, especially on the GPU, though we note that this function usually represents a small fraction of pbrt’s execution time, so this characteristic does not affect the system’s overall performance.
〈Square–Sphere Mapping Function Definitions〉 ≡
Vector3f EqualAreaSquareToSphere(Point2f p) {
〈Transform p to [−1, 1]2 and compute absolute values 113〉
〈Compute radius r as signed distance from diagonal 113〉
〈Compute angle ϕ for square to sphere mapping 113〉
〈Find z coordinate for spherical direction 113〉
〈Compute cos ϕ and sin ϕ for original quadrant and return vector 113〉
}
After transforming the original point p in [0, 1]2 to (u, v) ∈ [−1, 1]2, the implementation also computes the absolute value of these coordinates u′ = |u| and v′ = |v|. Doing so remaps the three quadrants with one or two negative coordinate values to the positive quadrant, flipping each quadrant so that its upper hemisphere is mapped to u′ + v′ < 1, which corresponds to the upper hemisphere in the original positive quadrant. (Each lower hemisphere is also mapped to the u′ + v′ > 1 region, corresponding to the original negative quadrant.)
Point2f 92
Vector3f 86
Figure 3.20: Computation of the Radius r for the Square-to-Disk Mapping. The signed distance to the u′ + v′ = 1 line is computed. One minus its absolute value gives a radius between 0 and 1.
〈Transform p to [−1, 1]2 and compute absolute values〉 ≡ Float u = 2 * p.x - 1, v = 2 * p.y - 1; Float up = std::abs(u), vp = std::abs(v); |
112 |
Most of this function’s implementation operates using (u′, v′) in the positive quadrant. Its next step is to compute the radius r for the mapping to the disk by computing the signed distance to the u + v = 1 diagonal that splits the upper and lower hemispheres where the lower hemisphere’s signed distance is negative (Figure 3.20).
〈Compute radius r as signed distance from diagonal〉 ≡ Float signedDistance = 1 - (up + vp); Float d = std::abs(signedDistance); Float r = 1 - d; |
112 |
The ϕ computation accounts for the 45° rotation with an added π/4 term.
〈Compute angle ϕ for square to sphere mapping〉 ≡ Float phi = (r == 0 ? 1 : (vp - up) / r + 1) * Pi / 4; |
112 |
The sign of the signed distance computed earlier indicates whether the (u′, v′) point is in the lower hemisphere; the returned z coordinate takes its sign.
〈Find z coordinate for spherical direction〉 ≡ Float z = pstd::copysign(1 - Sqr(r), signedDistance); |
112 |
After computing cos ϕ and sin ϕ in the positive quadrant, it is necessary to remap those values to the correct ones for the actual quadrant of the original point (u, v). Associating the sign of u with the computed cos ϕ value and the sign of v with sin ϕ suffices to do so and this operation can be done with another use of copysign().
〈Compute cos ϕ and sin ϕ for original quadrant and return vector〉 ≡ Float cosPhi = pstd::copysign(std::cos(phi), u); Float sinPhi = pstd::copysign(std::sin(phi), v); return Vector3f(cosPhi * r * SafeSqrt(2 - Sqr(r)), sinPhi * r * SafeSqrt(2 - Sqr(r)), z); |
112 |
EqualAreaSquareToSphere() 112
Float 23
Pi 1033
SafeSqrt() 1034
Sqr() 1034
Vector3f 86
The inverse mapping is performed by the EqualAreaSphereToSquare() function, which effectively performs the same operations in reverse and is therefore not included here. Also useful and also not included, WrapEqualAreaSquare() handles the boundary cases of points p that are just outside of [0, 1]2 (as may happen during bilinear interpolation with image texture lookups) and wraps them around to the appropriate valid coordinates that can be passed to EqualAreaSquareToSphere().
Figure 3.21: Bounding a Set of Directions with a Cone. A set of directions, shown here as a shaded region on the sphere, can be bounded using a cone described by a central direction vector v and a spread angle θ set such that all the directions in the set are inside the cone.
In addition to bounding regions of space, it is also sometimes useful to bound a set of directions. For example, if a light source emits illumination in some directions but not others, that information can be used to cull that light source from being included in lighting calculations for points it certainly does not illuminate. pbrt provides the DirectionCone class for such uses; it represents a cone that is parameterized by a central direction and an angular spread (see Figure 3.21).
〈DirectionCone Definition〉 ≡
class DirectionCone {
public:
〈DirectionCone Public Methods 114〉
〈DirectionCone Public Members 114〉
};
The DirectionCone provides a variety of constructors, including one that takes the central axis of the cone and the cosine of its spread angle and one that bounds a single direction. For both the constructor parameters and the cone representation stored in the class, the cosine of the spread angle is used rather than the angle itself. Doing so makes it possible to perform some of the following operations with DirectionCones using efficient dot products in place of more expensive trigonometric functions.
〈DirectionCone Public Methods〉 ≡ DirectionCone() = default; DirectionCone(Vector3f w, Float cosTheta) : w(Normalize(w)), cosTheta(cosTheta) {} explicit DirectionCone(Vector3f w) : DirectionCone(w, 1) {} |
114 |
The default DirectionCone is empty; an invalid value of infinity for cosTheta encodes that case.
〈DirectionCone Public Members〉 ≡ Vector3f w; Float cosTheta = Infinity; |
114 |
A convenience method reports whether the cone is empty.
〈DirectionCone Public Methods〉 +≡ bool IsEmpty() const { return cosTheta == Infinity; } |
114 |
DirectionCone 114
DirectionCone::cosTheta 114
Float 23
Infinity 361
Normalize() 88
Vector3f 86
Another convenience method provides the bound for all directions.
〈DirectionCone Public Methods〉 +≡ static DirectionCone EntireSphere() { return DirectionCone(Vector3f(0, 0, 1), -1); } |
114 |
Given a DirectionCone, it is easy to check if a given direction vector is inside its bounds: the cosine of the angle between the direction and the cone’s central direction must be greater than the cosine of the cone’s spread angle. (Note that for the angle to be smaller, the cosine must be larger.)
〈DirectionCone Inline Functions〉 ≡
bool Inside(const DirectionCone &d, Vector3f w) {
return !d.IsEmpty() && Dot(d.w, Normalize(w)) >= d.cosTheta;
}
BoundSubtendedDirections() returns a DirectionCone that bounds the directions subtended by a given bounding box with respect to a point p.
〈DirectionCone Inline Functions〉 +≡
DirectionCone BoundSubtendedDirections(const Bounds3f &b, Point3f p) {
〈Compute bounding sphere for b and check if p is inside 115〉
〈Compute and return DirectionCone for bounding sphere 115〉
}
First, a bounding sphere is found for the bounds b. If the given point p is inside the sphere, then a direction bound of all directions is returned. Note that the point p may be inside the sphere but outside b, in which case the returned bounds will be overly conservative. This issue is discussed further in an exercise at the end of the chapter.
〈Compute bounding sphere for b and check if p is inside〉 ≡ Float radius; Point3f pCenter; b.BoundingSphere(&pCenter, &radius); if (DistanceSquared(p, pCenter) < Sqr(radius)) return DirectionCone::EntireSphere(); |
115 |
Bounds3::BoundingSphere() 103
Bounds3f 97
DirectionCone 114
DirectionCone::cosTheta 114
DirectionCone::EntireSphere() 115
DirectionCone::IsEmpty() 114
DirectionCone::w 114
DistanceSquared() 93
Dot() 89
Float 23
Normalize() 88
Point3f 92
SafeSqrt() 1034
Sqr() 1034
Vector3f 86
Otherwise the central axis of the bounds is given by the vector from p to the center of the sphere and the cosine of the spread angle is easily found using basic trigonometry (see Figure 3.22).
〈Compute and return DirectionCone for bounding sphere〉 ≡ Vector3f w = Normalize(pCenter - p); Float sin2ThetaMax = Sqr(radius) / DistanceSquared(pCenter, p); Float cosThetaMax = SafeSqrt(1 - sin2ThetaMax); return DirectionCone(w, cosThetaMax); |
115 |
Finally, we will find it useful to be able to take the union of two DirectionCones, finding a DirectionCone that bounds both of them.
Figure 3.22: Finding the Angle That a Bounding Sphere Subtends from a Point p. Given a bounding sphere and a reference point p outside of the sphere, the cosine of the angle θ can be found by first computing sin θ by dividing the sphere’s radius r by the distance d between p and the sphere’s center and then using the identity sin2 θ + cos2 θ = 1.
〈DirectionCone Function Definitions〉 ≡
DirectionCone Union(const DirectionCone &a, const DirectionCone &b) {
〈Handle the cases where one or both cones are empty 116〉
〈Handle the cases where one cone is inside the other 116〉
〈Compute the spread angle of the merged cone, θo 117〉
〈Find the merged cone’s axis and return cone union 118〉
}
If one of the cones is empty, we can immediately return the other one.
〈Handle the cases where one or both cones are empty〉 ≡ if (a.IsEmpty()) return b; if (b.IsEmpty()) return a; |
116 |
Otherwise the implementation computes a few angles that will be helpful, including the actual spread angle of each cone as well as the angle between their two central direction vectors. These values give enough information to determine if one cone is entirely bounded by the other (see Figure 3.23).
〈Handle the cases where one cone is inside the other〉 ≡ Float theta_a = SafeACos(a.cosTheta), theta_b = SafeACos(b.cosTheta); Float theta_d = AngleBetween(a.w, b.w); if (std::min(theta_d + theta_b, Pi) <= theta_a) return a; if (std::min(theta_d + theta_a, Pi) <= theta_b) return b; |
116 |
Otherwise it is necessary to compute a new cone that bounds both of them. As illustrated in Figure 3.24, the sum of θa, θd, and θb gives the full angle that the new cone must cover; half of that is its spread angle.
AngleBetween() 89
DirectionCone 114
DirectionCone::cosTheta 114
DirectionCone::IsEmpty() 114
DirectionCone::w 114
Float 23
Pi 1033
SafeACos() 1035
Figure 3.23: Determining If One Cone of Directions Is Entirely inside Another. Given two direction cones a and b, their spread angles θa and θb, and the angle between their two central direction vectors θd, we can determine if one cone is entirely inside the other. Here, θa > θd + θb, and so b is inside a.
Figure 3.24: Computing the Spread Angle of the Direction Cone That Bounds Two Others. If θd is the angle between two cones’ central axes and the two cones have spread angles θa and θb, then the total angle that the cone bounds is θa + θd + θb and so its spread angle is half of that.
〈Compute the spread angle of the merged cone, θo〉 ≡ Float theta_o = (theta_a + theta_d + theta_b) / 2; if (theta_o >= Pi) return DirectionCone::EntireSphere(); |
116 |
The direction vector for the new cone should not be set with the average of the two cones’ direction vectors; that vector and a spread angle of θo does not necessarily bound the two given cones. Using that vector would require a spread angle of θd/2 + max(2θa, 2θb), which is never less than θo. (It is worthwhile to sketch out a few cases on paper to convince yourself of this.)
DirectionCone::EntireSphere() 115
Float 23
Pi 1033
Rotate() 126
Instead, we find the vector perpendicular to the cones’ direction vectors using the cross product and rotate a.w by the angle around that axis that causes it to bound both cones’ angles. (The Rotate() function used for this will be introduced shortly, in Section 3.9.7.) In the case that LengthSquared(wr) == 0, the vectors face in opposite directions and a bound of the entire sphere is returned.2
〈Find the merged cone’s axis and return cone union〉 ≡ Float theta_r = theta_o - theta_a; Vector3f wr = Cross(a.w, b.w); if (LengthSquared(wr) == 0) return DirectionCone::EntireSphere(); Vector3f w = Rotate(Degrees(theta_r), wr)(a.w); return DirectionCone(w, std::cos(theta_o)); |
116 |
In general, a transformation T is a mapping from points to points and from vectors to vectors:
p′ = T(p) v′ = T(v).
The transformation T may be an arbitrary procedure. However, we will consider a subset of all possible transformations in this chapter. In particular, they will be
We will often want to take a point, vector, or normal defined with respect to one coordinate frame and find its coordinate values with respect to another frame. Using basic properties of linear algebra, a 4 × 4 matrix can be shown to express the linear transformation of a point or vector from one frame to another. Furthermore, such a 4 × 4 matrix suffices to express all linear transformations of points and vectors within a fixed frame, such as translation in space or rotation around a point. Therefore, there are two different (and incompatible!) ways that a matrix can be interpreted:
Most uses of transformations in pbrt are for transforming points from one frame to another.
In general, transformations make it possible to work in the most convenient coordinate space. For example, we can write routines that define a virtual camera, assuming that the camera is located at the origin, looks down the z axis, and has the y axis pointing up and the x axis pointing right. These assumptions greatly simplify the camera implementation. To place the camera at any point in the scene looking in any direction, we construct a transformation that maps points in the scene’s coordinate system to the camera’s coordinate system. (See Section 5.1.1 for more information about camera coordinate spaces in pbrt.)
Cross() 91
Degrees() 1033
DirectionCone 114
DirectionCone::EntireSphere() 115
DirectionCone::w 114
Float 23
LengthSquared() 87
Rotate() 126
Vector3f 86
Given a frame defined by (po, v1, v2, v3), there is ambiguity between the representation of a point (px, py, pz) and a vector (vx, vy, vz) with the same (x, y, z) coordinates. Using the representations of points and vectors introduced at the start of the chapter, we can write the point as the inner product [s1 s2 s3 1][v1 v2 v3 po]T and the vector as the inner product . These four-vectors of three si values and a zero or one are called the homogeneous representations of the point and the vector. The fourth coordinate of the homogeneous representation is sometimes called the weight. For a point, its value can be any scalar other than zero: the homogeneous points [1, 3, −2, 1] and [−2, −6, 4, −2] describe the same Cartesian point (1, 3, −2). Converting homogeneous points into ordinary points entails dividing the first three components by the weight:
We will use these facts to see how a transformation matrix can describe how points and vectors in one frame can be mapped to another frame. Consider a matrix M that describes the transformation from one coordinate system to another:
(In this book, we define matrix element indices starting from zero, so that equations and source code correspond more directly.) Then if the transformation represented by M is applied to the x axis vector (1, 0, 0), we have
Mx = M[1 0 0 0]T = [m0, 0 m1, 0 m2, 0 m3, 0]T.
Thus, directly reading the columns of the matrix shows how the basis vectors and the origin of the current coordinate system are transformed by the matrix:
My = [m0, 1 m1, 1 m2, 1 m3, 1]T
Mz = [m0, 2 m1, 2 m2, 2 m3, 2]T
Mp = [m0, 3 m1, 3 m2, 3 m3, 3]T.
In general, by characterizing how the basis is transformed, we know how any point or vector specified in terms of that basis is transformed. Because points and vectors in a coordinate system are expressed in terms of the coordinate system’s frame, applying the transformation to them directly is equivalent to applying the transformation to the coordinate system’s basis and finding their coordinates in terms of the transformed basis.
We will not use homogeneous coordinates explicitly in our code; there is no Homogeneous Point class in pbrt. However, the various transformation routines in the next section will implicitly convert points, vectors, and normals to homogeneous form, transform the homogeneous points, and then convert them back before returning the result. This isolates the details of homogeneous coordinates in one place (namely, the implementation of transformations).
3.9.2 Transform CLASS DEFINITION
The Transform class represents a 4 × 4 transformation. Its implementation is in the files util/transform.h and util/transform.cpp.
〈Transform Definition〉 ≡
class Transform {
public:
〈Transform Public Methods 120〉
private:
〈Transform Private Members 120
};
The transformation matrix is represented by the elements of the matrix m, which is represented by a SquareMatrix<4> object. (The SquareMatrix class is defined in Section B.2.12.) The matrix m is stored in row-major form, so element m[i][j] corresponds to mi,j, where i is the row number and j is the column number. For convenience, the Transform also stores the inverse of m in its Transform::mInv member variable; for pbrt’s needs, it is better to have the inverse easily available than to repeatedly compute it as needed.
〈Transform Private Members〉 ≡ SquareMatrix<4> m, mInv; |
120 |
This representation of transformations is relatively memory hungry: assuming 4 bytes of storage for a Float value, a Transform requires 128 bytes of storage. Used naïvely, this approach can be wasteful; if a scene has millions of shapes but only a few thousand unique transformations, there is no reason to redundantly store the same matrices many times. Therefore, Shapes in pbrt store a pointer to a Transform and the scene specification code defined in Section C.2.3 uses an InternCache of Transforms to ensure that all shapes that share the same transformation point to a single instance of that transformation in memory.
When a new Transform is created, it defaults to the identity transformation—the transformation that maps each point and each vector to itself. This transformation is represented by the identity matrix:
The implementation here relies on the default SquareMatrix constructor to fill in the identity matrix for m and mInv.
〈Transform Public Methods〉 ≡ Transform() = default; |
120 |
A Transform can also be created from a given matrix. In this case, the matrix must be explicitly inverted.
〈Transform Public Methods〉 +≡ Transform(const SquareMatrix<4> &m) : m(m) { pstd::optional<SquareMatrix<4>> inv = Inverse(m); if (inv) mInv = *inv; else { 〈Initialize mInv with not-a-number values 121〉 } } |
120 |
InternCache 1070
Shape 261
SquareMatrix 1049
SquareMatrix::Inverse() 1051
Transform 120
Transform::mInv 120
If the matrix provided by the caller is degenerate and cannot be inverted, mInv is initialized with floating-point not-a-number values, which poison computations that involve them: arithmetic performed using a not-a-number value always gives a not-a-number value. In this way, a caller who provides a degenerate matrix m can still use the Transform as long as no methods that access mInv are called.
〈Initialize mInv with not-a-number values〉 ≡ Float NaN = std::numeric_limits<Float>::has_signaling_NaN ? std::numeric_limits<Float>::signaling_NaN() : std::numeric_limits<Float>::quiet_NaN(); for (int i = 0; i < 4; ++i) for (int j = 0; j < 4; ++j) mInv[i][j] = NaN; |
120 |
Another constructor allows specifying the elements of the matrix using a regular 2D array.
〈Transform Public Methods〉 +≡ Transform(const Float mat[4][4]) : Transform(SquareMatrix<4>(mat)) {} |
120 |
The most commonly used constructor takes a reference to the transformation matrix along with an explicitly provided inverse. This is a superior approach to computing the inverse in the constructor because many geometric transformations have simple inverses and we can avoid the expense and potential loss of numeric accuracy from computing a general 4 × 4 matrix inverse. Of course, this places the burden on the caller to make sure that the supplied inverse is correct.
〈Transform Public Methods〉 +≡ Transform(const SquareMatrix<4> &m, const SquareMatrix<4> &mInv) : m(m), mInv(mInv) {} |
120 |
Both the matrix and its inverse are made available for callers that need to access them directly.
〈Transform Public Methods〉 +≡ const SquareMatrix<4> &GetMatrix() const { return m; } const SquareMatrix<4> &GetInverseMatrix() const { return mInv; } |
120 |
The Transform representing the inverse of a Transform can be returned by just swapping the roles of mInv and m.
〈Transform Inline Functions〉 ≡
Transform Inverse(const Transform &t) {
return Transform(t.GetInverseMatrix(), t.GetMatrix());
}
Transposing the two matrices in the transform to compute a new transform can also be useful.
〈Transform Inline Functions〉 +≡
Transform Transpose(const Transform &t) {
return Transform(Transpose(t.GetMatrix()),
Transpose(t.GetInverseMatrix()));
}
The Transform class also provides equality and inequality testing methods as well as an IsIdentity() method that checks to see if the transformation is the identity.
Float 23
SquareMatrix 1049
SquareMatrix::Transpose() 1051
Transform 120
Transform::GetInverseMatrix() 121
Transform::GetMatrix() 121
Transform::m 120
Transform::mInv 120
〈Transform Public Methods〉 +≡ bool operator==(const Transform &t) const { return t.m == m; } bool operator!=(const Transform &t) const { return t.m != m; } bool IsIdentity() const { return m.IsIdentity(); } |
120 |
One of the simplest transformations is the translation transformation, T(Δx, Δy, Δz). When applied to a point p, it translates p’s coordinates by Δx, Δy, and Δz, as shown in Figure 3.25. As an example, T(2, 2, 1)(x, y, z) = (x + 2, y + 2, z + 1).
Translation has some basic properties:
Translation only affects points, leaving vectors unchanged. In matrix form, the translation transformation is
When we consider the operation of a translation matrix on a point, we see the value of homogeneous coordinates. Consider the product of the matrix for T(Δx, Δy, Δz) with a point p in homogeneous coordinates [x y z 1]T:
As expected, we have computed a new point with its coordinates offset by (Δx, Δy, Δz). However, if we apply T to a vector v, we have
SquareMatrix::IsIdentity() 1050
Transform 120
Transform::m 120
Figure 3.25: Translation in 2D. Adding offsets Δx and Δy to a point’s coordinates correspondingly changes its position in space.
The result is the same vector v. This makes sense because vectors represent directions, so translation leaves them unchanged.
The Translate() function returns a Transform that represents a given translation—it is a straightforward application of the translation matrix equation. The inverse of the translation is easily computed, so it is provided to the Transform constructor as well.
〈Transform Function Definitions〉 ≡
Transform Translate(Vector3f delta) {
SquareMatrix<4> m(1, 0, 0, delta.x,
0, 1, 0, delta.y,
0, 0, 1, delta.z,
0, 0, 0, 1);
SquareMatrix<4> minv(1, 0, 0, -delta.x,
0, 1, 0, -delta.y,
0, 0, 1, -delta.z,
0, 0, 0, 1);
return Transform(m, minv);
}
Another basic transformation is the scale transformation, S(sx, sy, sz). It has the effect of taking a point or vector and multiplying its components by scale factors in x, y, and z: S(2, 2, 1)(x, y, z) = (2x, 2y, z). It has the following basic properties:
We can differentiate between uniform scaling, where all three scale factors have the same value, and nonuniform scaling, where they may have different values. The general scale matrix is
〈Transform Function Definitions〉 +≡
Transform Scale(Float x, Float y, Float z) {
SquareMatrix<4> m(x, 0, 0, 0,
0, y, 0, 0,
0, 0, z, 0,
0, 0, 0, 1);
SquareMatrix<4> minv(1 / x, 0, 0, 0,
0, 1 / y, 0, 0,
0, 0, 1 / z, 0,
0, 0, 0, 1);
return Transform(m, minv);
}
Float 23
SquareMatrix 1049
Transform 120
Vector3f 86
It is useful to be able to test if a transformation has a scaling term in it; an easy way to do this is to transform the three coordinate axes and see if any of their lengths are appreciably different from one.
〈Transform Public Methods〉 +≡ bool HasScale(Float tolerance = 1e-3f) const { Float la2 = LengthSquared((*this)(Vector3f(1, 0, 0))); Float lb2 = LengthSquared((*this)(Vector3f(0, 1, 0))); Float lc2 = LengthSquared((*this)(Vector3f(0, 0, 1))); return (std::abs(la2 - 1) > tolerance ‖ std::abs(lb2 - 1) > tolerance ‖ std::abs(lc2 - 1) > tolerance); } |
120 |
3.9.6 x, y, AND z AXIS ROTATIONS
Another useful type of transformation is the rotation transformation, R. In general, we can define an arbitrary axis from the origin in any direction and then rotate around that axis by a given angle. The most common rotations of this type are around the x, y, and z coordinate axes. We will write these rotations as Rx(θ), Ry(θ), and so on. The rotation around an arbitrary axis (x, y, z) is denoted by R(x,y,z)(θ).
Rotations also have some basic properties:
where RT is the matrix transpose of R. This last property, that the inverse of R is equal to its transpose, stems from the fact that R is an orthogonal matrix; its first three columns (or rows) are all normalized and orthogonal to each other. Fortunately, the transpose is much easier to compute than a full matrix inverse.
For a left-handed coordinate system, the matrix for clockwise rotation around the x axis is
Figure 3.26 gives an intuition for how this matrix works.
It is easy to see that the matrix leaves the x axis unchanged:
Rx(θ)[1 0 0 0]T = [1 0 0 0]T.
It maps the y axis (0, 1, 0) to (0, cos θ, sin θ) and the z axis to (0, − sin θ, cos θ). The y and z axes remain in the same plane, perpendicular to the x axis, but are rotated by the given angle. An arbitrary point in space is similarly rotated about the x axis by this transformation while staying in the same yz plane as it was originally.
The implementation of the RotateX() function is straightforward.
Float 23
LengthSquared() 87
RotateX() 125
Vector3f 86
Figure 3.26: Clockwise rotation by an angle θ about the x axis leaves the x coordinate unchanged. The y and z axes are mapped to the vectors given by the dashed lines; y and z coordinates move accordingly.
〈Transform Function Definitions〉 +≡
Transform RotateX(Float theta) {
Float sinTheta = std::sin(Radians(theta));
Float cosTheta = std::cos(Radians(theta));
SquareMatrix<4> m(1, 0, 0, 0,
0, cosTheta, -sinTheta, 0,
0, sinTheta, cosTheta, 0,
0, 0, 0, 1);
return Transform(m, Transpose(m));
}
Similarly, for clockwise rotation around y and z, we have
The implementations of RotateY() and RotateZ() follow directly and are not included here.
3.9.7 ROTATION AROUND AN ARBITRARY AXIS
We also provide a routine to compute the transformation that represents rotation around an arbitrary axis. A common derivation of this matrix is based on computing rotations that map the given axis to a fixed axis (e.g., z), performing the rotation there, and then rotating the fixed axis back to the original axis. A more elegant derivation can be constructed with vector algebra.
Consider a normalized direction vector a that gives the axis to rotate around by angle θ, and a vector v to be rotated (Figure 3.27).
First, we can compute the vector vc along the axis a that is in the plane through the end point of v and is parallel to a. Assuming v and a form an angle α, we have
vc = a ‖v‖ cos α = a(v · a).
Float 23
Radians() 1033
SquareMatrix 1049
Transform 120
We now compute a pair of basis vectors v1 and v2 in this plane. Trivially, one of them is
v1 = v − vc,
Figure 3.27: A vector v can be rotated around an arbitrary axis a by constructing a coordinate system (p, v1, v2) in the plane perpendicular to the axis that passes through v’s end point and rotating the vectors v1 and v2 about p. Applying this rotation to the axes of the coordinate system (1, 0, 0), (0, 1, 0), and (0, 0, 1) gives the general rotation matrix for this rotation.
and the other can be computed with a cross product
v2 = (v1 × a).
Because a is normalized, v1 and v2 have the same length, equal to the length of the vector between v and vc. To now compute the rotation by an angle θ about vc in the plane of rotation, the rotation formulae earlier give us
v′ = vc + v1 cos θ + v2 sin θ.
To convert this to a rotation matrix, we apply this formula to the basis vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) to get the values of the rows of the matrix. The result of all this is encapsulated in the following function. As with the other rotation matrices, the inverse is equal to the transpose.
Because some callers of the Rotate() function already have sin θ and cos θ at hand, pbrt provides a variant of the function that takes those values directly.
〈Transform Inline Functions〉 +≡
Transform Rotate(Float sinTheta, Float cosTheta, Vector3f axis) {
Vector3f a = Normalize(axis);
SquareMatrix<4> m;
〈Compute rotation of first basis vector 126〉
〈Compute rotations of second and third basis vectors〉
return Transform(m, Transpose(m));
}
〈Compute rotation of first basis vector〉 ≡ m[0][0] = a.x * a.x + (1 - a.x * a.x) * cosTheta; m[0][1] = a.x * a.y * (1 - cosTheta) - a.z * sinTheta; m[0][2] = a.x * a.z * (1 - cosTheta) + a.y * sinTheta; m[0][3] = 0; |
126 |
Float 23
Normalize() 88
SquareMatrix 1049
SquareMatrix::Transpose() 1051
Transform 120
Vector3f 86
The code for the other two basis vectors follows similarly and is not included here.
A second variant of Rotate() takes the angle θ in degrees, computes its sine and cosine, and calls the first.
〈Transform Inline Functions〉 +≡
Transform Rotate(Float theta, Vector3f axis) {
Float sinTheta = std::sin(Radians(theta));
Float cosTheta = std::cos(Radians(theta));
return Rotate(sinTheta, cosTheta, axis);
}
3.9.8 ROTATING ONE VECTOR TO ANOTHER
It is sometimes useful to find the transformation that performs a rotation that aligns one unit vector f with another t (where f denotes “from” and t denotes “to”). One way to do so is to define a rotation axis by the cross product of the two vectors, compute the rotation angle as the arccosine of their dot product, and then use the Rotate() function. However, this approach not only becomes unstable when the two vectors are nearly parallel but also requires a number of expensive trigonometric function calls.
A different approach to deriving this rotation matrix is based on finding a pair of reflection transformations that reflect f to an intermediate vector r and then reflect r to t. The product of such a pair of reflections gives the desired rotation. The Householder matrix H(v) provides a way to find these reflections: it reflects the given vector v to its negation −v while leaving all vectors orthogonal to v unchanged and is defined as
where I is the identity matrix.
With the product of the two reflections
the second matrix reflects f to r and the first then reflects r to t, which together give the desired rotation.
〈Transform Inline Functions〉 +≡
Transform RotateFromTo(Vector3f from, Vector3f to) {
〈Compute intermediate vector for vector reflection 127〉
〈Initialize matrix r for rotation 128〉
return Transform(r, Transpose(r));
}
The intermediate reflection direction refl is determined by choosing a basis vector that is not too closely aligned to either of the from and to vectors. In the computation here, because 0.72 is just slightly greater than , the absolute value of at least one pair of matching coordinates must then both be less than 0.72, assuming the vectors are normalized. In this way, a loss of accuracy is avoided when the reflection direction is nearly parallel to either from or to.
〈Compute intermediate vector for vector reflection〉 ≡ Vector3f refl; if (std::abs(from.x) < 0.72f && std::abs(to.x) < 0.72f) refl = Vector3f(1, 0, 0); else if (std::abs(from.y) < 0.72f && std::abs(to.y) < 0.72f) refl = Vector3f(0, 1, 0); else refl = Vector3f(0, 0, 1); |
127 |
Float 23
Radians() 1033
Rotate() 126
SquareMatrix::Transpose() 1051
Transform 120
Vector3f 86
Given the reflection axis, the matrix elements can be initialized directly.
〈Initialize matrix r for rotation〉 ≡ Vector3f u = refl - from, v = refl - to; SquareMatrix<4> r; for (int i = 0; i < 3; ++i) for (int j = 0; j < 3; ++j) 〈Initialize matrix element r[i][j] 128〉 |
127 |
Expanding the product of the Householder matrices in Equation (3.10), we can find that the matrix element ri,j is given by
where δi,j is the Kronecker delta function that is 1 if i and j are equal and 0 otherwise. The implementation follows directly.
〈Initialize matrix element r[i][j]〉 ≡ r[i][j] = ((i == j) ? 1 : 0) - 2 / Dot(u, u) * u[i] * u[j] - 2 / Dot(v, v) * v[i] * v[j] + 4 * Dot(u, v) / (Dot(u, u) * Dot(v, v)) * v[i] * u[j]; |
128 |
3.9.9 THE LOOK-AT TRANSFORMATION
The look-at transformation is particularly useful for placing a camera in the scene. The caller specifies the desired position of the camera, a point the camera is looking at, and an “up” vector that orients the camera along the viewing direction implied by the first two parameters. All of these values are typically given in world-space coordinates; this gives a transformation from world space to camera space (Figure 3.28). We will assume that use in the discussion below, though note that this way of specifying transformations can also be useful for placing light sources in the scene.
In order to find the entries of the look-at transformation matrix, we use principles described earlier in this section: the columns of a transformation matrix give the effect of the transformation on the basis of a coordinate system.
Dot() 89
SquareMatrix 1049
Vector3f 86
Figure 3.28: Given a camera position, the position being looked at from the camera, and an “up” direction, the look-at transformation describes a transformation from a left-handed viewing coordinate system where the camera is at the origin looking down the +z axis, and the +y axis is along the up direction.
〈Transform Function Definitions〉 +≡
Transform LookAt(Point3f pos, Point3f look, Vector3f up) {
SquareMatrix<4> worldFromCamera;
〈Initialize fourth column of viewing matrix 129〉
〈Initialize first three columns of viewing matrix 129〉
SquareMatrix<4> cameraFromWorld = InvertOrExit(worldFromCamera);
return Transform(cameraFromWorld, worldFromCamera);
}
The easiest column is the fourth one, which gives the point that the camera-space origin, [0 0 0 1]T, maps to in world space. This is clearly just the camera position, supplied by the user.
〈Initialize fourth column of viewing matrix〉 ≡ worldFromCamera[0][3] = pos.x; worldFromCamera[1][3] = pos.y; worldFromCamera[2][3] = pos.z; worldFromCamera[3][3] = 1; |
129 |
The other three columns are not much more difficult. First, LookAt() computes the normalized direction vector from the camera location to the look-at point; this gives the vector coordinates that the z axis should map to and, thus, the third column of the matrix. (In a left-handed coordinate system, camera space is defined with the viewing direction down the +z axis.) The first column, giving the world-space direction that the +x axis in camera space maps to, is found by taking the cross product of the user-supplied “up” vector with the recently computed viewing direction vector. Finally, the “up” vector is recomputed by taking the cross product of the viewing direction vector with the transformed x axis vector, thus ensuring that the y and z axes are perpendicular and we have an orthonormal viewing coordinate system.
〈Initialize first three columns of viewing matrix〉 ≡ Vector3f dir = Normalize(look - pos); Vector3f right = Normalize(Cross(Normalize(up), dir)); Vector3f newUp = Cross(dir, right); worldFromCamera[0][0] = right.x; worldFromCamera[1][0] = right.y; worldFromCamera[2][0] = right.z; worldFromCamera[3][0] = 0.; worldFromCamera[0][1] = newUp.x; worldFromCamera[1][1] = newUp.y; worldFromCamera[2][1] = newUp.z; worldFromCamera[3][1] = 0.; worldFromCamera[0][2] = dir.x; worldFromCamera[1][2] = dir.y; worldFromCamera[2][2] = dir.z; worldFromCamera[3][2] = 0.; |
129 |
Cross() 91
LookAt() 129
Normalize() 88
Point3f 92
SquareMatrix 1049
SquareMatrix::InvertOrExit() 1051
Transform 120
Vector3f 86
We can now define routines that perform the appropriate matrix multiplications to transform points and vectors. We will overload the function application operator to describe these transformations; this lets us write code like:
Point3f p = ...;
Transform T = ...;
Point3f pNew = T(p);
The point transformation routine takes a point (x, y, z) and implicitly represents it as the homogeneous column vector [x y z 1]T. It then transforms the point by premultiplying this vector with the transformation matrix. Finally, it divides by w to convert back to a non-homogeneous point representation. For efficiency, this method skips the division by the homogeneous weight, w, when w = 1, which is common for most of the transformations that will be used in pbrt—only the projective transformations defined in Chapter 5 will require this division.
〈Transform Inline Methods〉 ≡
template <typename T>
Point3<T> Transform::operator()(Point3<T> p) const {
T xp = m[0][0] * p.x + m[0][1] * p.y + m[0][2] * p.z + m[0][3];
T yp = m[1][0] * p.x + m[1][1] * p.y + m[1][2] * p.z + m[1][3];
T zp = m[2][0] * p.x + m[2][1] * p.y + m[2][2] * p.z + m[2][3];
T wp = m[3][0] * p.x + m[3][1] * p.y + m[3][2] * p.z + m[3][3];
if (wp == 1)
return Point3<T>(xp, yp, zp);
else
return Point3<T>(xp, yp, zp) / wp;
}
The Transform class also provides a corresponding ApplyInverse() method for each type it transforms. The one for Point3 applies its inverse transformation to the given point. Calling this method is more succinct and generally more efficient than calling Transform::Inverse() and then calling its operator().
〈Transform Public Methods〉 +≡ template <typename T> Point3<T> ApplyInverse(Point3<T> p) const; |
120 |
All subsequent types that can be transformed also have an ApplyInverse() method, though we will not include them in the book text.
The transformations of vectors can be computed in a similar fashion. However, the multiplication of the matrix and the column vector is simplified since the implicit homogeneous w coordinate is zero.
〈Transform Inline Methods〉 +≡
template <typename T>
Vector3<T> Transform::operator()(Vector3<T> v) const {
return Vector3<T>(m[0][0] * v.x + m[0][1] * v.y + m[0][2] * v.z,
m[1][0] * v.x + m[1][1] * v.y + m[1][2] * v.z,
m[2][0] * v.x + m[2][1] * v.y + m[2][2] * v.z);
}
Point3 92
Transform 120
Transform::Inverse() 121
Transform::m 120
Vector3 86
Figure 3.29: Transforming Surface Normals. (a) Original circle, with the normal at a point indicated by an arrow. (b) When scaling the circle to be half as tall in the y direction, simply treating the normal as a direction and scaling it in the same manner gives a normal that is no longer perpendicular to the surface. (c) A properly transformed normal.
Normals do not transform in the same way that vectors do, as shown in Figure 3.29. Although tangent vectors at a surface transform in the straightforward way, normals require special treatment. Because the normal vector n and any tangent vector t on the surface are orthogonal by construction, we know that
n · t = nT t = 0.
When we transform a point on the surface by some matrix M, the new tangent vector t′ at the transformed point is Mt. The transformed normal n′ should be equal to Sn for some 4×4 matrix S. To maintain the orthogonality requirement, we must have
This condition holds if ST M = I, the identity matrix. Therefore, ST = M−1, and so S = (M−1)T, and we see that normals must be transformed by the inverse transpose of the transformation matrix. This detail is one of the reasons why Transforms maintain their inverses.
Note that this method does not explicitly compute the transpose of the inverse when transforming normals. It just indexes into the inverse matrix in a different order (compare to the code for transforming Vector3fs).
〈Transform Inline Methods〉 +≡
template <typename T>
Normal3<T> Transform::operator()(Normal3<T> n) const {
T x = n.x, y = n.y, z = n.z;
return Normal3<T>(mInv[0][0] * x + mInv[1][0] * y + mInv[2][0] * z,
mInv[0][1] * x + mInv[1][1] * y + mInv[2][1] * z,
mInv[0][2] * x + mInv[1][2] * y + mInv[2][2] * z);
}
Normal3 94
RayDifferential 96
Transform 120
Transform::mInv 120
Vector3f 86
Transforming rays is conceptually straightforward: it is a matter of transforming the constituent origin and direction and copying the other data members. (pbrt also provides a similar method for transforming RayDifferentials.)
The approach used in pbrt to manage floating-point round-off error introduces some subtleties that require a small adjustment to the transformed ray origin. The 〈Offset ray origin to edge of error bounds and compute tMax〉 fragment handles these details; it is defined in Section 6.8.6, where round-off error and pbrt’s mechanisms for dealing with it are discussed.
〈Transform Inline Methods〉 +≡
Ray Transform::operator()(const Ray &r, Float *tMax) const {
Point3fi o = (*this)(Point3fi(r.o));
Vector3f d = (*this)(r.d);
〈Offset ray origin to edge of error bounds and compute tMax 383〉
return Ray(Point3f(o), d, r.time, r.medium);
}
The easiest way to transform an axis-aligned bounding box is to transform all eight of its corner vertices and then compute a new bounding box that encompasses those points. The implementation of this approach is shown below; one of the exercises for this chapter is to implement a technique to do this computation more efficiently.
〈Transform Method Definitions〉 ≡
Bounds3f Transform::operator()(const Bounds3f &b) const {
Bounds3f bt;
for (int i = 0; i < 8; ++i)
bt = Union(bt, (*this)(b.Corner(i)));
return bt;
}
3.10.6 COMPOSITION OF TRANSFORMATIONS
Having defined how the matrices representing individual types of transformations are constructed, we can now consider an aggregate transformation resulting from a series of individual transformations. We will finally see the real value of representing transformations with matrices.
Consider a series of transformations ABC. We would like to compute a new transformation T such that applying T gives the same result as applying each of A, B, and C in reverse order; that is, A(B(C(p))) = T(p). Such a transformation T can be computed by multiplying the matrices of the transformations A, B, and C together. In pbrt, we can write:
Transform T = A * B * C;
Then we can apply T to Point3fs p as usual, Point3f pp = T(p), instead of applying each transformation in turn: Point3f pp = A(B(C(p))).
We overload the C++ * operator in the Transform class to compute the new transformation that results from postmultiplying a transformation with another transformation t2. In matrix multiplication, the (i, j)th element of the resulting matrix is the inner product of the ith row of the first matrix with the jth column of the second.
Bounds3::Corner() 99
Bounds3::Union() 99
Bounds3f 97
Float 23
Point3f 92
Point3fi 1061
Ray 95
Transform 120
Vector3f 86
The inverse of the resulting transformation is equal to the product of t2.mInv * mInv. This is a result of the matrix identity
(AB)−1 = B−1A−1.
〈Transform Method Definitions〉 +≡
Transform Transform::operator*(const Transform &t2) const {
return Transform(m * t2.m, t2.mInv * mInv);
}
3.10.7 TRANSFORMATIONS AND COORDINATE SYSTEM HANDEDNESS
Certain types of transformations change a left-handed coordinate system into a right-handed one, or vice versa. Some routines will need to know if the handedness of the source coordinate system is different from that of the destination. In particular, routines that want to ensure that a surface normal always points “outside” of a surface might need to flip the normal’s direction after transformation if the handedness changes.
Fortunately, it is easy to tell if handedness is changed by a transformation: it happens only when the determinant of the transformation’s upper-left 3×3 submatrix is negative.
〈Transform Method Definitions〉 +≡
bool Transform::SwapsHandedness() const {
SquareMatrix<3> s(m[0][0], m[0][1], m[0][2],
m[1][0], m[1][1], m[1][2],
m[2][0], m[2][1], m[2][2]);
return Determinant(s) < 0;
}
It is sometimes useful to define a rotation that aligns three orthonormal vectors in a coordinate system with the x, y, and z axes. Applying such a transformation to direction vectors in that coordinate system can simplify subsequent computations. For example, in pbrt, BSDF evaluation is performed in a coordinate system where the surface normal is aligned with the z axis. Among other things, this makes it possible to efficiently evaluate trigonometric functions using functions like the CosTheta() function that was introduced in Section 3.8.3.
The Frame class efficiently represents and performs such transformations, avoiding the full generality (and hence, complexity) of the Transform class. It only needs to store a 3 × 3 matrix, and storing the inverse is unnecessary since it is just the matrix’s transpose, given orthonormal basis vectors.
〈Frame Definition〉 ≡
class Frame {
public:
〈Frame Public Methods 133〉
〈Frame Public Members 133〉
};
Given three orthonormal vectors x, y, and z, the matrix F that transforms vectors into their space is
The Frame stores this matrix using three Vector3fs.
〈Frame Public Members〉 ≡ Vector3f x, y, z; |
133 |
CosTheta() 107
DCHECK() 1066
Frame 133
SquareMatrix 1049
SquareMatrix::Determinant() 1051
Vector3f 86
The three basis vectors can be specified explicitly; in debug builds, DCHECK()s in the constructor ensure that the provided vectors are orthonormal.
〈Frame Public Methods〉 ≡ Frame() : x(1, 0, 0), y(0, 1, 0), z(0, 0, 1) {} Frame(Vector3f x, Vector3f y, Vector3f z); |
133 |
Frame also provides convenience methods that construct a frame from just two of the basis vectors, using the cross product to compute the third.
〈Frame Public Methods〉 +≡ static Frame FromXZ(Vector3f x, Vector3f z) { return Frame(x, Cross(z, x), z); } static Frame FromXY(Vector3f x, Vector3f y) { return Frame(x, y, Cross(x, y)); } |
133 |
Only the z axis vector can be provided as well, in which case the others are set arbitrarily.
〈Frame Public Methods〉 +≡ static Frame FromZ(Vector3f z) { Vector3f x, y; CoordinateSystem(z, &x, &y); return Frame(x, y, z); } |
133 |
A variety of other functions, not included here, allow specifying a frame using a normal vector and specifying it via just the x or y basis vector.
Transforming a vector into the frame’s coordinate space is done using the F matrix. Because Vector3fs were used to store its rows, the matrix-vector product can be expressed as three dot products.
〈Frame Public Methods〉 +≡ Vector3f ToLocal(Vector3f v) const { return Vector3f(Dot(v, x), Dot(v, y), Dot(v, z)); } |
133 |
A ToLocal() method is also provided for normal vectors. In this case, we do not need to compute the inverse transpose of F for the transformation normals (recall the discussion of transforming normals in Section 3.10.3). Because F is an orthonormal matrix (its rows and columns are mutually orthogonal and unit length), its inverse is equal to its transpose, so it is its own inverse transpose already.
〈Frame Public Methods〉 +≡ Normal3f ToLocal(Normal3f n) const { return Normal3f(Dot(n, x), Dot(n, y), Dot(n, z)); } |
133 |
The method that transforms vectors out of the frame’s local space transposes F to find its inverse before multiplying by the vector. In this case, the resulting computation can be expressed as the sum of three scaled versions of the matrix columns. As before, surface normals transform as regular vectors. (That method is not included here.)
CoordinateSystem() 92
Cross() 91
Dot() 89
Frame 133
Frame::x 133
Frame::y 133
Frame::z 133
Normal3f 94
Transform 120
Vector3f 86
〈Frame Public Methods〉 +≡ Vector3f FromLocal(Vector3f v) const { return v.x * x + v.y * y + v.z * z; } |
133 |
For convenience, there is a Transform constructor that takes a Frame. Its simple implementation is not included here.
〈Transform Public Methods〉 +≡ explicit Transform(const Frame &frame); |
120 |
Figure 3.30: Spinning Spheres. Three spheres, reflected in a mirror, spinning at different rates using pbrt’s transformation animation code. Note that the reflections of the spheres are blurry as well as the spheres themselves.
3.10.9 ANIMATING TRANSFORMATIONS
pbrt supports time-varying transformation matrices for cameras and geometric primitives in the scene. Rather than just supplying a single transformation to place an object in the scene, the user may supply a number of keyframe transformations, each one associated with a particular time. This makes it possible for the camera to move and for objects in the scene to be in motion during the time the simulated camera’s shutter is open. Figure 3.30 shows three spheres animated using keyframe matrix animation in pbrt.
Directly interpolating the matrix elements of transformation matrices at different times usually does not work well, especially if a rotation is included in the associated change of transformation. pbrt therefore implements algorithms that decompose transformations into translations, rotations, and scales, each of which can be independently interpolated before they are reassembled to form an interpolated transformation. The AnimatedTransform class that implements those algorithms is not included here in the printed book, though the online edition of the book (recall Section 1.4.3) includes thorough documentation of its implementation. Here we will summarize its interface so that its use in forthcoming text can be understood.
Its constructor takes two transformations and associated times. Due to the computational cost of decomposing and recomposing transformations as well as the storage requirements of AnimatedTransform, which total roughly 400 bytes, it is worthwhile to avoid using AnimatedTransform if the two matrices are equal.
AnimatedTransform(Transform startTransform, Float startTime,
Transform endTransform, Float endTime);
The Interpolate() method returns the interpolated transformation for the given time. If the time is outside of the range specified to the constructor, whichever of startTransform or endTransform is closest in time is returned.
Transform Interpolate(Float time) const;
Point3f 92
Methods are also available to apply transformations and inverse transformations to pbrt’s basic geometric classes. For example, the following two methods transform points. (Because Point3f does not store an associated time, the time must be provided separately. However, classes like Ray and Interaction that do store a time are passed to their transformation methods unaccompanied.)
Point3f operator()(Point3f p, Float time) const;
Point3f ApplyInverse(Point3f p, Float time) const;
It is usually more efficient to transform a geometric object using those methods than to retrieve the interpolated Transform using the Interpolate() method and then use its transformation methods since the specialized transformation methods can apply optimizations like not computing unneeded inverse transformations.
The other key method provided by AnimatedTransform is MotionBounds(), which computes a bounding box that bounds the motion of a bounding box over the AnimatedTransform’s time range. Taking the union of the bounds of the transformed bounding box at startTime and endTime is not sufficient to bound the box’s motion over intermediate times; this method therefore takes care of the tricky details of accurately bounding the motion.
Bounds3f MotionBounds(const Bounds3f &b) const;
The last abstractions in this chapter, SurfaceInteraction and MediumInteraction, respectively represent local information at points on surfaces and in participating media. For example, the ray–shape intersection routines in Chapter 6 return information about the local differential geometry at intersection points in a SurfaceInteraction. Later, the texturing code in Chapter 10 computes material properties using values from the SurfaceInteraction. The closely related MediumInteraction class is used to represent points where light interacts with participating media like smoke or clouds. The implementations of all of these classes are in the files interaction.h and interaction.cpp.
Both SurfaceInteraction and MediumInteraction inherit from a generic Interaction class that provides common member variables and methods, which allows parts of the system for which the differences between surface and medium interactions do not matter to be implemented purely in terms of Interactions.
〈Interaction Definition〉 ≡
class Interaction {
public:
〈Interaction Public Methods 136〉
〈Interaction Public Members 137〉
};
A variety of Interaction constructors are available; depending on what sort of interaction is being constructed and what sort of information about it is relevant, corresponding sets of parameters are accepted. This one is the most general of them.
〈Interaction Public Methods〉 ≡ Interaction(Point3fi pi, Normal3f n, Point2f uv, Vector3f wo, Float time) : pi(pi), n(n), uv(uv), wo(Normalize(wo)), time(time) {} |
136 |
Float 23
Interaction 136
Interval 1057
MediumInteraction 141
Normal3f 94
Point2f 92
Point3fi 1061
Ray 95
Vector3f 86
All interactions have a point p associated with them. This point is stored using the Point3fi class, which uses an Interval to represent each coordinate value. Storing a small interval of floating-point values rather than a single Float makes it possible to represent bounds on the numeric error in the intersection point, as occurs when the point p was computed by a ray intersection calculation. This information will be useful for avoiding incorrect self-intersections for rays leaving surfaces, as will be discussed in Section 6.8.6.
〈Interaction Public Members〉 ≡ Point3fi pi; |
136 |
Interaction provides a convenience method that returns a regular Point3f for the interaction point for the parts of the system that do not need to account for any error in it (e.g., the texture evaluation routines).
〈Interaction Public Methods〉 +≡ Point3f p() const { return Point3f(pi); } |
136 |
All interactions also have a time associated with them. Among other uses, this value is necessary for setting the time of a spawned ray leaving the interaction.
〈Interaction Public Members〉 +≡ Float time = 0; |
136 |
For interactions that lie along a ray (either from a ray–shape intersection or from a ray passing through participating media), the negative ray direction is stored in the wo member variable, which corresponds to ωo, the notation we use for the outgoing direction when computing lighting at points. For other types of interaction points where the notion of an outgoing direction does not apply (e.g., those found by randomly sampling points on the surface of shapes), wo has the value (0, 0, 0).
〈Interaction Public Members〉 +≡ Vector3f wo; |
136 |
For interactions on surfaces, n stores the surface normal at the point and uv stores its (u, v) parametric coordinates. It is fair to ask, why are these values stored in the base Interaction class rather than in SurfaceInteraction? The reason is that there are some parts of the system that mostly do not care about the distinction between surface and medium interactions—for example, some of the routines that sample points on light sources given a point to be illuminated. Those make use of these values if they are available and ignore them if they are set to zero. By accepting the small dissonance of having them in the wrong place here, the implementations of those methods and the code that calls them is made that much simpler.
〈Interaction Public Members〉 +≡ Normal3f n; Point2f uv; |
136 |
It is possible to check if a pointer or reference to an Interaction is one of the two subclasses. A nonzero surface normal is used as a distinguisher for a surface.
〈Interaction Public Methods〉 +≡ bool IsSurfaceInteraction() const { return n != Normal3f(0, 0, 0); } bool IsMediumInteraction() const { return !IsSurfaceInteraction(); } |
136 |
Methods are provided to cast to the subclass types as well. This is a good place for a run-time check to ensure that the requested conversion is valid. The non-const variant of this method as well as corresponding AsMedium() methods follow similarly and are not included in the text.
Float 23
Interaction:: IsSurfaceInteraction() 137
Interaction::n 137
Normal3f 94
Point2f 92
Point3f 92
Point3fi 1061
Vector3f 86
〈Interaction Public Methods〉 +≡ const SurfaceInteraction &AsSurface() const { CHECK(IsSurfaceInteraction()); return (const SurfaceInteraction &)*this; } |
136 |
Interactions can also represent either an interface between two types of participating media using an instance of the MediumInterface class, which is defined in Section 11.4, or the properties of the scattering medium at their point using a Medium. Here as well, the Interaction abstraction leaks: surfaces can represent interfaces between media, and at a point inside a medium, there is no interface but there is the current medium. Both of these values are stored in Interaction for the same reasons of expediency that n and uv were.
〈Interaction Public Members〉 +≡ const MediumInterface *mediumInterface = nullptr; Medium medium = nullptr; |
136 |
As described earlier, the geometry of a particular point on a surface (often a position found by intersecting a ray against the surface) is represented by a SurfaceInteraction. Having this abstraction lets most of the system work with points on surfaces without needing to consider the particular type of geometric shape the points lie on.
〈SurfaceInteraction Definition〉 ≡
class SurfaceInteraction : public Interaction {
public:
〈SurfaceInteraction Public Methods 139〉
〈SurfaceInteraction Public Members 138〉
};
In addition to the point p, the surface normal n, and (u, v) coordinates from the parameterization of the surface from the Interaction base class, the SurfaceInteraction also stores the parametric partial derivatives of the point ∂p/∂u and ∂p/∂v and the partial derivatives of the surface normal ∂n/∂u and ∂n/∂v. See Figure 3.31 for a depiction of these values.
〈SurfaceInteraction Public Members〉 ≡ Vector3f dpdu, dpdv; Normal3f dndu, dndv; |
138 |
This representation implicitly assumes that shapes have a parametric description—that for some range of (u, v) values, points on the surface are given by some function f such that p = f(u, v). Although this is not true for all shapes, all of the shapes that pbrt supports do have at least a local parametric description, so we will stick with the parametric representation since this assumption is helpful elsewhere (e.g., for antialiasing of textures in Chapter 10).
CHECK() 1066
Interaction 136
Interaction:: IsSurfaceInteraction() 137
Medium 714
MediumInterface 715
Normal3f 94
SurfaceInteraction 138
Vector3f 86
The SurfaceInteraction constructor takes parameters that set all of these values. It computes the normal as the cross product of the partial derivatives.
Figure 3.31: The Local Differential Geometry around a Point p. The parametric partial derivatives of the surface, ∂p/∂u and ∂p/∂v, lie in the tangent plane but are not necessarily orthogonal. The surface normal n is given by the cross product of ∂p/∂u and ∂p/∂v. The vectors ∂n/∂u and ∂n/∂v record the differential change in surface normal as we move u and v along the surface.
〈SurfaceInteraction Public Methods〉 ≡ SurfaceInteraction(Point3fi pi, Point2f uv, Vector3f wo, Vector3f dpdu, Vector3f dpdv, Normal3f dndu, Normal3f dndv, Float time, bool flipNormal) : Interaction(pi, Normal3f(Normalize(Cross(dpdu, dpdv))), uv, wo, time), dpdu(dpdu), dpdv(dpdv), dndu(dndu), dndv(dndv) { 〈Initialize shading geometry from true geometry 139〉 〈Adjust normal based on orientation and handedness 140〉 } |
138 |
SurfaceInteraction stores a second instance of a surface normal and the various partial derivatives to represent possibly perturbed values of these quantities—as can be generated by bump mapping or interpolated per-vertex normals with meshes. Some parts of the system use this shading geometry, while others need to work with the original quantities.
〈SurfaceInteraction Public Members〉 +≡ struct { Normal3f n; Vector3f dpdu, dpdv; Normal3f dndu, dndv; } shading; |
138 |
The shading geometry values are initialized in the constructor to match the original surface geometry. If shading geometry is present, it generally is not computed until some time after the SurfaceInteraction constructor runs. The SetShadingGeometry() method, to be defined shortly, updates the shading geometry.
〈Initialize shading geometry from true geometry〉 ≡ shading.n = n; shading.dpdu = dpdu; shading.dpdv = dpdv; shading.dndu = dndu; shading.dndv = dndv; |
139 |
Cross() 91
Float 23
Interaction 136
Normal3f 94
Normalize() 88
Point2f 92
Point3fi 1061
SurfaceInteraction 138
SurfaceInteraction:: shading::dndu 139
SurfaceInteraction:: shading::dndv 139
SurfaceInteraction:: shading::dpdu 139
SurfaceInteraction:: shading::dpdv 139
SurfaceInteraction:: shading::n 139
Vector3f 86
The surface normal has special meaning to pbrt, which assumes that, for closed shapes, the normal is oriented such that it points to the outside of the shape. For geometry used as an area light source, light is by default emitted from only the side of the surface that the normal points toward; the other side is black. Because normals have this special meaning, pbrt provides a mechanism for the user to reverse the orientation of the normal, flipping it to point in the opposite direction. A ReverseOrientation directive in a pbrt input file flips the normal to point in the opposite, non-default direction. Therefore, it is necessary to check if the given Shape has the corresponding flag set and, if so, switch the normal’s direction here.
However, one other factor plays into the orientation of the normal and must be accounted for here as well. If a shape’s transformation matrix has switched the handedness of the object coordinate system from pbrt’s default left-handed coordinate system to a right-handed one, we need to switch the orientation of the normal as well. To see why this is so, consider a scale matrix S(1, 1, −1). We would naturally expect this scale to switch the direction of the normal, although because we have computed the normal by n = ∂p/∂u × ∂p/∂v,
Therefore, it is also necessary to flip the normal’s direction if the transformation switches the handedness of the coordinate system, since the flip will not be accounted for by the computation of the normal’s direction using the cross product. A flag passed by the caller indicates whether this flip is necessary.
〈Adjust normal based on orientation and handedness〉 ≡ if (flipNormal) { n *= -1; shading.n *= -1; } |
139 |
pbrt also provides the capability to associate an integer index with each face of a polygon mesh. This information is used for certain texture mapping operations. A separate SurfaceInteraction constructor allows its specification.
〈SurfaceInteraction Public Members〉 +≡ int faceIndex = 0; |
138 |
When a shading coordinate frame is computed, the SurfaceInteraction is updated via its SetShadingGeometry() method.
〈SurfaceInteraction Public Methods〉 +≡ void SetShadingGeometry(Normal3f ns, Vector3f dpdus, Vector3f dpdvs, Normal3f dndus, Normal3f dndvs, bool orientationIsAuthoritative) { 〈Compute shading.n for SurfaceInteraction 141〉 〈Initialize shading partial derivative values 141〉 } |
138 |
Interaction::n 137
Normal3f 94
SurfaceInteraction:: shading::n 139
Vector3f 86
After performing the same cross product (and possibly flipping the orientation of the normal) as before to compute an initial shading normal, the implementation then flips either the shading normal or the true geometric normal if needed so that the two normals lie in the same hemisphere. Since the shading normal generally represents a relatively small perturbation of the geometric normal, the two of them should always be in the same hemisphere.
Depending on the context, either the geometric normal or the shading normal may more authoritatively point toward the correct “outside” of the surface, so the caller passes a Boolean value that determines which should be flipped if needed.
〈Compute shading.n for SurfaceInteraction〉 ≡ shading.n = ns; if (orientationIsAuthoritative) n = FaceForward(n, shading.n); else shading.n = FaceForward(shading.n, n); |
140 |
With the normal set, the various partial derivatives can be copied.
〈Initialize shading partial derivative values〉 ≡ shading.dpdu = dpdus; shading.dpdv = dpdvs; shading.dndu = dndus; shading.dndv = dndvs; |
140 |
As described earlier, the MediumInteraction class is used to represent an interaction at a point in a scattering medium like smoke or clouds.
〈MediumInteraction Definition〉 ≡
class MediumInteraction : public Interaction {
public:
〈MediumInteraction Public Methods 141〉
〈MediumInteraction Public Members 141〉
};
In contrast to SurfaceInteraction, it adds little to the base Interaction class. The only addition is a PhaseFunction, which describes how the particles in the medium scatter light. Phase functions and the PhaseFunction class are introduced in Section 11.3.
〈MediumInteraction Public Methods〉 ≡ MediumInteraction(Point3f p, Vector3f wo, Float time, Medium medium, PhaseFunction phase) : Interaction(p, wo, time, medium), phase(phase) {} |
141 |
〈MediumInteraction Public Members〉 ≡ PhaseFunction phase; |
141 |
FaceForward() 94
Float 23
Interaction 136
Interaction::n 137
Medium 714
MediumInteraction 141
PhaseFunction 710
Point3f 92
SurfaceInteraction:: shading::dndu 139
SurfaceInteraction:: shading::dndv 139
SurfaceInteraction:: shading::dpdu 139
SurfaceInteraction:: shading::dpdv 139
SurfaceInteraction:: shading::n 139
Vector3f 86
DeRose, Goldman, and their collaborators have argued for an elegant “coordinate-free” approach to describing vector geometry for graphics, where the fact that positions and directions happen to be represented by (x, y, z) coordinates with respect to a particular coordinate system is deemphasized and where points and vectors themselves record which coordinate system they are expressed in terms of (Goldman 1985; DeRose 1989; Mann, Litke, and DeRose 1997). This makes it possible for a software layer to ensure that common errors like adding a vector in one coordinate system to a point in another coordinate system are transparently handled by transforming them to a common coordinate system first. A related approach was described by Geisler et al. (2020), who encoded coordinate systems using the programming language’s type system. We have not followed either of these approaches in pbrt, although the principles behind them are well worth understanding and keeping in mind when working with coordinate systems in computer graphics.
Schneider and Eberly’s Geometric Tools for Computer Graphics is influenced by the coordinate-free approach and covers the topics of this chapter in much greater depth (Schneider and Eberly 2003). It is also full of useful geometric algorithms for graphics. A classic and more traditional introduction to the topics of this chapter is Mathematical Elements for Computer Graphics by Rogers and Adams (1990). Note that their book uses a row-vector representation of points and vectors, however, which means that our matrices would be transposed when expressed in their framework, and that they multiply points and vectors by matrices to transform them (pM), rather than multiplying matrices by points as we do (Mp). Homogeneous coordinates were only briefly mentioned in this chapter, although they are the basis of projective geometry, where they are the foundation of many elegant algorithms. Stolfi’s book is an excellent introduction to this topic (Stolfi 1991).
There are many good books on linear algebra and vector geometry. We have found Lang (1986) and Buck (1978) to be good references on these respective topics. See also Akenine-Möller et al.’s Real-Time Rendering book (2018) for a solid graphics-based introduction to linear algebra. Ström et al. have written an excellent online linear algebra book, immersivemath.com, that features interactive figures that illustrate the key concepts (2020).
Donnay’s book (1945) gives a concise but complete introduction to spherical trigonometry. The expression for the solid angle of a triangle in Equation (3.6) is due to Van Oosterom and Strackee (1983).
An alternative approach for designing a vector math library is exemplified by the widely used eigen system by Guennebaud, Jacob, and others (2010). In addition to including support for CPU SIMD vector instruction sets, it makes extensive use of expression templates, a C++ programming technique that makes it possible to simplify and optimize the evaluation of vector and matrix expressions at compile time.
The subtleties of how normal vectors are transformed were first widely understood in the graphics community after articles by Wallis (1990) and Turkowski (1990b).
Cigolle et al. (2014) compared a wide range of approaches for compactly encoding unit vectors. The approach implemented in OctahedralVector is due to Meyer et al. (2010), who also showed that if 52 bits are used with this representation, the precision is equal to that of normalized Vector3fs. (Our implementation also includes an improvement suggested by Cigolle et al. (2014).) The octahedral encoding it is based on was introduced by Praun and Hoppe (2003).
The equal-area sphere mapping algorithm in Section 3.8.3 is due to Clarberg (2008); our implementation of the mapping functions is derived from the high-performance CPU SIMD implementation that accompanies that paper. The square-to-hemisphere mapping that it is based on was developed by Shirley and Chiu (1997).
The algorithm used in CoordinateSystem() is based on an approach first derived by Frisvad (2012). The reformulation to improve numerical accuracy that we have used in our implementation was derived concurrently by Duff et al. (2017) and by Max (2017). The algorithm implemented in RotateFromTo() was introduced by Möller and Hughes (1999), with an adjustment to the computation of the reflection vector due to Hughes (2021).
The numerically robust AngleBetween() function defined in this chapter is due to Hatch (2003).
AngleBetween() 89
CoordinateSystem() 92
OctahedralVector 109
RotateFromTo() 127
Vector3f 86
An algorithm to compute a tight bounding cone for multiple direction vectors was given by Barequet and Elber (2005).
The algorithms used in the AnimatedTransform implementation are based on the polar matrix decomposition approach that was described by Shoemake and Duff (1992); see the online edition of this book for further references to methods for animating transformations.
➊ 3.1 | Find a more efficient way to transform axis-aligned bounding boxes by taking advantage of the symmetries of the problem: because the eight corner points are linear combinations of three axis-aligned basis vectors and a single corner point, their transformed bounding box can be found more efficiently than by the method we have presented (Arvo 1990). |
➋ 3.2 | Instead of boxes, tighter bounds around objects could be computed by using the intersections of many nonorthogonal slabs. Extend the bounding box representation in pbrt to allow the user to specify a bound comprised of arbitrary slabs. |
➋ 3.3 | The DirectionCone::BoundSubtendedDirections() method bounds the directions that a Bounds3f subtends from a given reference point by first finding a sphere that bounds the Bounds3f and then bounding the directions it subtends. While this gives a valid bound, it is not necessarily the smallest one possible. Derive an improved algorithm that acts directly on the bounding box, update the implementation of BoundSubtendedDirections(), and render scenes where that method is used (e.g., those that use a BVHLightSampler to sample light sources). How are running time and image quality affected? Can you find a scene where this change gives a significant benefit? |
➊ 3.4 | Change pbrt so that it transforms Normal3fs just like Vector3fs, and create a scene that gives a clearly incorrect image due to this bug. (Do not forget to revert this change from your copy of the source code when you are done!) |
AnimatedTransform 135
Bounds3f 97
BVHLightSampler 796
DirectionCone:: BoundSubtendedDirections() 115
Normal3f 94
Vector3f 86
_________________
1 This form of inheritance is often referred to as the curiously recurring template pattern (CRTP) in C++.
2 A tighter bound is possible in this case, but it occurs very rarely and so we have not bothered with handling it more effectively.
04 RADIOMETRY, SPECTRA, AND COLOR
To precisely describe how light is represented and sampled to compute images, we must first establish some background in radiometry—the study of the propagation of electromagnetic radiation in an environment. In this chapter, we will first introduce four key quantities that describe electromagnetic radiation: flux, intensity, irradiance, and radiance.
These radiometric quantities generally vary as a function of wavelength. The variation of each is described by its spectral distribution—a distribution function that gives the amount of light at each wavelength. (We will interchangeably use spectrum to describe spectral distributions, and spectra for a plurality of them.) Of particular interest in rendering are the wavelengths (λ) of electromagnetic radiation between approximately 380 nm and 780 nm, which account for light visible to humans.1 A variety of classes that are used to represent spectral distributions in pbrt are defined in Section 4.5.
While spectral distributions are a purely physical concept, color is related to how humans perceive spectra. The lower wavelengths of light (λ ≈ 400 nm) are said to be bluish colors, the middle wavelengths (λ ≈ 550 nm) greens, and the upper wavelengths (λ ≈ 650 nm) reds. It is important to have accurate models of color for two reasons: first, display devices like monitors expect colors rather than spectra to describe pixel values, so accurately converting spectra to appropriate colors is important for displaying rendered images. Second, emission and reflection properties of objects in scenes are often specified using colors; these colors must be converted into spectra for use in rendering. Section 4.6, at the end of this chapter, describes the properties of color in more detail and includes implementations of pbrt’s colorrelated functionality.
Radiometry provides a set of ideas and mathematical tools to describe light propagation and reflection. It forms the basis of the derivation of the rendering algorithms that will be used throughout the rest of this book. Interestingly enough, radiometry was not originally derived from first principles using the physics of light but was built on an abstraction of light based on particles flowing through space. As such, effects like polarization of light do not naturally fit into this framework, although connections have since been made between radiometry and Maxwell’s equations, giving radiometry a solid basis in physics.
Radiative transfer is the phenomenological study of the transfer of radiant energy. It is based on radiometric principles and operates at the geometric optics level, where macroscopic properties of light suffice to describe how light interacts with objects much larger than the light’s wavelength. It is not uncommon to incorporate phenomena from wave optics models of light, but these results need to be expressed in the language of radiative transfer’s basic abstractions.
In this manner, it is possible to describe interactions of light with objects of approximately the same size as the wavelength of the light, and thereby model effects like dispersion and interference. At an even finer level of detail, quantum mechanics is needed to describe light’s interaction with atoms. Fortunately, direct simulation of quantum mechanical principles is unnecessary for solving rendering problems in computer graphics, so the intractability of such an approach is avoided.
In pbrt, we will assume that geometric optics is an adequate model for the description of light and light scattering. This leads to a few basic assumptions about the behavior of light that will be used implicitly throughout the system:
The most significant loss from adopting a geometric optics model is the incompatibility with diffraction and interference effects. Even though this incompatibility can be circumvented—for example, by replacing radiance with the concept of a Wigner distribution function (Oh et al. 2010, Cuypers et al. 2012)—such extensions are beyond the scope of this book.
There are four radiometric quantities that are central to rendering: flux, irradiance/radiant exitance, intensity, and radiance. They can each be derived from energy by successively taking limits over time, area, and directions. All of these radiometric quantities are in general wavelength dependent, though we will defer that topic until Section 4.1.3.
Energy
Our starting point is energy, which is measured in joules (J). Sources of illumination emit photons, each of which is at a particular wavelength and carries a particular amount of energy. All the basic radiometric quantities are effectively different ways of measuring photons. A photon at wavelength λ carries energy
where c is the speed of light, 299,472,458 m/s, and h is Planck’s constant, h ≈ 6.626 × 10−34 m2 kg/s.
Flux
Energy measures work over some period of time, though under the steady-state assumption generally used in rendering, we are mostly interested in measuring light at an instant. Radiant flux, also known as power, is the total amount of energy passing through a surface or region of space per unit time. Radiant flux can be found by taking the limit of differential energy per differential time:
Its units are joules/second (J/s), or more commonly, watts (W).
For example, given a light that emitted Q = 200,000 J over the course of an hour, if the same amount of energy was emitted at all times over the hour, we can find that the light source’s flux was
Φ = 200,000 J/3600 s ≈ 55.6 W.
Conversely, given flux as a function of time, we can integrate over a range of times to compute the total energy:
Note that our notation here is slightly informal: among other issues, because photons are discrete quanta, it is not meaningful to take limits that go to zero for differential time. For the purposes of rendering, where the number of photons is enormous with respect to the measurements we are interested in, this detail is not problematic.
Total emission from light sources is generally described in terms of flux. Figure 4.1 shows flux from a point light source measured by the total amount of energy passing through imaginary spheres around the light. Note that the total amount of flux measured on either of the two spheres in Figure 4.1 is the same—although less energy is passing through any local part of the large sphere than the small sphere, the greater area of the large sphere means that the total flux is the same.
Figure 4.1: Radiant flux, Φ, measures energy passing through a surface or region of space. Here, flux from a point light source is measured at spheres that surround the light.
Irradiance and Radiant Exitance
Any measurement of flux requires an area over which photons per time is being measured. Given a finite area A, we can define the average density of power over the area by E = Φ/A. This quantity is either irradiance (E), the area density of flux arriving at a surface, or radiant exitance (M), the area density of flux leaving a surface. These measurements have units of W/m2. (The term irradiance is sometimes also used to refer to flux leaving a surface, but for clarity we will use different terms for the two cases.)
For the point light source example in Figure 4.1, irradiance at a point on the outer sphere is less than the irradiance at a point on the inner sphere, since the surface area of the outer sphere is larger. In particular, if the point source is emitting the same amount of illumination in all directions, then for a sphere in this configuration that has radius r,
This fact explains why the amount of energy received from a light at a point falls off with the squared distance from the light.
More generally, we can define irradiance and radiant exitance by taking the limit of differential power per differential area at a point p:
We can also integrate irradiance over an area to find power:
The irradiance equation can also help us understand the origin of Lambert’s law, which says that the amount of light energy arriving at a surface is proportional to the cosine of the angle between the light direction and the surface normal (Figure 4.2). Consider a light source with area A and flux Φ that is illuminating a surface. If the light is shining directly down on the surface (as on the left side of the figure), then the area on the surface receiving light A1 is equal to A. Irradiance at any point inside A1 is then
Figure 4.2: Lambert’s Law. Irradiance arriving at a surface varies according to the cosine of the angle of incidence of illumination, since illumination is over a larger area at larger incident angles.
However, if the light is at an angle to the surface, the area on the surface receiving light is larger. If A is small, then the area receiving flux, A2, is roughly A/cos θ. For points inside A2, the irradiance is therefore
Intensity
Consider now an infinitesimal light source emitting photons. If we center this light source within the unit sphere, we can compute the angular density of emitted power. Intensity, denoted by I, is this quantity; it has units W/sr. Over the entire sphere of directions, we have
but more generally we are interested in taking the limit of a differential cone of directions:
As usual, we can go back to power by integrating intensity: given intensity as a function of direction I(ω), we can integrate over a finite set of directions Ω to recover the power:
Intensity describes the directional distribution of light, but it is only meaningful for point light sources.
Radiance
The final, and most important, radiometric quantity is radiance, L. Irradiance and radiant exitance give us differential power per differential area at a point p, but they do not distinguish the directional distribution of power. Radiance takes this last step and measures irradiance or radiant exitance with respect to solid angles. It is defined by
where we have used Eω to denote irradiance at the surface that is perpendicular to the direction ω. In other words, radiance is not measured with respect to the irradiance incident at the surface p lies on. In effect, this change of measurement area serves to eliminate the cos θ factor from Lambert’s law in the definition of radiance.
Figure 4.3: Radiance L is defined as flux per unit solid angle dω per unit projected area dA⊥.
Radiance is the flux density per unit area, per unit solid angle. In terms of flux, it is defined by
where dA⊥ is the projected area of dA on a hypothetical surface perpendicular to ω (Figure 4.3). Thus, it is the limit of the measurement of incident light at the surface as a cone of incident directions of interest dω becomes very small and as the local area of interest on the surface dA also becomes very small.
Of all of these radiometric quantities, radiance will be the one used most frequently throughout the rest of the book. An intuitive reason for this is that in some sense it is the most fundamental of all the radiometric quantities; if radiance is given, then all the other values can be computed in terms of integrals of radiance over areas and directions. Another nice property of radiance is that it remains constant along rays through empty space. It is thus a natural quantity to compute with ray tracing.
4.1.2 INCIDENT AND EXITANT RADIANCE FUNCTIONS
When light interacts with surfaces in the scene, the radiance function L is generally not continuous across the surface boundaries. In the most extreme case of a fully opaque surface (e.g., a mirror), the radiance function slightly above and slightly below a surface could be completely unrelated.
It therefore makes sense to take one-sided limits at the discontinuity to distinguish between the radiance function just above and below
Figure 4.4: (a) The incident radiance function Li(p, ω) describes the distribution of radiance arriving at a point as a function of position and direction. (b) The exitant radiance function Lo(p, ω) gives the distribution of radiance leaving the point. Note that for both functions, ω is oriented to point away from the surface, and thus, for example, Li(p, −ω) gives the radiance arriving on the other side of the surface than the one where ω lies.
where np is the surface normal at p. However, keeping track of one-sided limits throughout the text is unnecessarily cumbersome.
We prefer to solve this ambiguity by making a distinction between radiance arriving at the point (e.g., due to illumination from a light source) and radiance leaving that point (e.g., due to reflection from a surface).
Consider a point p on the surface of an object. There is some distribution of radiance arriving at the point that can be described mathematically by a function of position and direction. This function is denoted by Li(p, ω) (Figure 4.4). The function that describes the outgoing reflected radiance from the surface at that point is denoted by Lo(p, ω). Note that in both cases the direction vector ω is oriented to point away from p, but be aware that some authors use a notation where ω is reversed for Li terms so that it points toward p.
There is a simple relation between these more intuitive incident and exitant radiance functions and the one-sided limits from Equation (4.4):
Throughout the book, we will use the idea of incident and exitant radiance functions to resolve ambiguity in the radiance function at boundaries.
Another property to keep in mind is that at a point in space where there is no surface (i.e., in free space), L is continuous, so L+ = L−, which means
Lo(p, ω) = Li(p, −ω) = L(p, ω).
In other words, Li and Lo only differ by a direction reversal.
4.1.3 RADIOMETRIC SPECTRAL DISTRIBUTIONS
Thus far, all the radiometric quantities have been defined without considering variation in their distribution over wavelengths. They have therefore effectively been the integrals of wavelength-dependent quantities over an (unspecified) range of wavelengths of interest. Just as we were able to define the various radiometric quantities in terms of limits of other quantities, we can also define their spectral variants by taking their limits over small wavelength ranges.
For example, we can define spectral radiance Lλ as the limit of radiance over an infinitesimal interval of wavelengths Δλ,
In turn, radiance can be found by integrating spectral radiance over a range of wavelengths:
Definitions for the other radiometric quantities follow similarly. All of these spectral variants have an additional factor of 1/m in their units.
4.1.4 LUMINANCE AND PHOTOMETRY
All the radiometric measurements like flux, radiance, and so forth have corresponding photometric measurements. Photometry is the study of visible electromagnetic radiation in terms of its perception by the human visual system. Each spectral radiometric quantity can be converted to its corresponding photometric quantity by integrating against the spectral response curve V (λ), which describes the relative sensitivity of the human eye to various wavelengths.2
Luminance measures how bright a spectral power distribution appears to a human observer. For example, luminance accounts for the fact that a spectral distribution with a particular amount of energy in the green wavelengths will appear brighter to a human than a spectral distribution with the same amount of energy in blue.
We will denote luminance by Y; it is related to spectral radiance by
Luminance and the spectral response curve V (λ) are closely related to the XYZ representation of color, which will be introduced in Section 4.6.1.
The units of luminance are candelas per meter squared (cd/m2), where the candela is the photometric equivalent of radiant intensity. Some representative luminance values are given in Table 4.1.
All the other radiometric quantities that we have introduced in this chapter have photometric equivalents; they are summarized in Table 4.2.3
Table 4.1: Representative Luminance Values for a Number of Lighting Conditions.
Condition |
Luminance (cd/m2, or nits) |
Sun at horizon |
600,000 |
60-watt lightbulb |
120,000 |
Clear sky |
8,000 |
Typical office |
100–1,000 |
Typical computer display |
1–100 |
Street lighting |
1–10 |
Cloudy moonlight |
0.25 |
Table 4.2: Radiometric Measurements and Their Photometric Analogs.
Radiometric |
Unit |
Photometric |
Unit |
Radiant energy |
joule (J) |
Luminous energy |
talbot (T) |
Radiant flux |
watt (W) |
Luminous flux |
lumen (lm) |
Intensity |
W/sr |
Luminous intensity |
lm/sr = candela (cd) |
Irradiance |
W/m2 |
Illuminance |
lm/m2 = lux (lx) |
Radiance |
W/(m2sr) |
Luminance |
lm/(m2sr) = cd/m2 = nit |
Figure 4.5: Irradiance at a point p is given by the integral of radiance times the cosine of the incident direction over the entire upper hemisphere above the point.
4.2 WORKING WITH RADIOMETRIC INTEGRALS
A frequent task in rendering is the evaluation of integrals of radiometric quantities. In this section, we will present some tricks that can make it easier to do this. To illustrate the use of these techniques, we will take the computation of irradiance at a point as an example. Irradiance at a point p with surface normal n due to radiance over a set of directions Ω is
where Li(p, ω) is the incident radiance function (Figure 4.5) and the cos θ factor in the integrand is due to the dA⊥ factor in the definition of radiance. θ is measured as the angle between ω and the surface normal n. Irradiance is usually computed over the hemisphere ℌ2(n) of directions about a given surface normal n.
Figure 4.6: The projected solid angle subtended by an object is the cosine-weighted solid angle that it subtends. It can be computed by finding the object’s solid angle, projecting it down to the plane perpendicular to the surface normal, and measuring its area there. Thus, the projected solid angle depends on the surface normal where it is being measured, since the normal orients the plane of projection.
The integral in Equation (4.7) is with respect to solid angle on the hemisphere and the measure dω corresponds to surface area on the unit hemisphere. (Recall the definition of solid angle in Section 3.8.1.)
4.2.1 INTEGRALS OVER PROJECTED SOLID ANGLE
The various cosine factors in the integrals for radiometric quantities can often distract from what is being expressed in the integral. This problem can be avoided using projected solid angle rather than solid angle to measure areas subtended by objects being integrated over. The projected solid angle subtended by an object is determined by projecting the object onto the unit sphere, as was done for the solid angle, but then projecting the resulting shape down onto the unit disk that is perpendicular to the surface normal (Figure 4.6). Integrals over hemispheres of directions with respect to cosine-weighted solid angle can be rewritten as integrals over projected solid angle.
The projected solid angle measure is related to the solid angle measure by
dω⊥ = |cos θ| dω,
so the irradiance-from-radiance integral over the hemisphere can be written more simply as
For the rest of this book, we will write integrals over directions in terms of solid angle, rather than projected solid angle. In other sources, however, projected solid angle may be used, so it is always important to be aware of the integrand’s actual measure.
4.2.2 INTEGRALS OVER SPHERICAL COORDINATES
It is often convenient to transform integrals over solid angle into integrals over spherical coordinates (θ, ϕ) using Equation (3.7). In order to convert an integral over a solid angle to an integral over (θ, ϕ), we need to be able to express the relationship between the differential area of a set of directions dω and the differential area of a (θ, ϕ) pair (Figure 4.7). The differential area on the unit sphere dω is the product of the differential lengths of its sides, sin θ dϕ and dθ. Therefore,
Figure 4.7: The differential area dω subtended by a differential solid angle is the product of the differential lengths of the two edges sin θdϕ and dθ. The resulting relationship, dω = sin θdθdϕ, is the key to converting between integrals over solid angles and integrals over spherical angles.
(This result can also be derived using the multidimensional transformation approach from Section 2.4.1.)
We can thus see that the irradiance integral over the hemisphere, Equation (4.7) with Ω = ℌ2(n), can equivalently be written as
If the radiance is the same from all directions, the equation simplifies to E = πLi.
One last useful transformation is to turn integrals over directions into integrals over area. Consider the irradiance integral in Equation (4.7) again, and imagine there is a quadrilateral with constant outgoing radiance and that we could like to compute the resulting irradiance at a point p. Computing this value as an integral over directions ω or spherical coordinates (θ, ϕ) is in general not straightforward, since given a particular direction it is nontrivial to determine if the quadrilateral is visible in that direction or (θ, ϕ). It is much easier to compute the irradiance as an integral over the area of the quadrilateral.
Differential area dA on a surface is related to differential solid angle as viewed from a point p by
where θ is the angle between the surface normal of dA and the vector to p, and r is the distance from p to dA (Figure 4.8). We will not derive this result here, but it can be understood intuitively: if dA is at distance 1 from p and is aligned exactly so that it is perpendicular to dω, then dω = dA, θ = 0, and Equation (4.9) holds. As dA moves farther away from p, or as it rotates so that it is not aligned with the direction of dω, the r2 and cos θ factors compensate accordingly to reduce dω.
Therefore, we can write the irradiance integral for the quadrilateral source as
where L is the emitted radiance from the surface of the quadrilateral, θi is the angle between the surface normal at p and the direction from p to the point p′ on the light, and θo is the angle between the surface normal at p′ on the light and the direction from p′ to p (Figure 4.9).
Figure 4.8: The differential solid angle dω subtended by a differential area dA is equal to dA cos θ/r2, where θ is the angle between dA’s surface normal and the vector to the point p and r is the distance from p to dA.
Figure 4.9: To compute irradiance at a point p from a quadrilateral source, it is easier to integrate over the surface area of the source than to integrate over the irregular set of directions that it subtends. The relationship between solid angles and areas given by Equation (4.9) lets us go back and forth between the two approaches.
When light is incident on a surface, the surface scatters the light, reflecting some of it back into the environment. There are two main effects that need to be described to model this reflection: the spectral distribution of the reflected light and its directional distribution. For example, the skin of a lemon mostly absorbs light in the blue wavelengths but reflects most of the light in the red and green wavelengths. Therefore, when it is illuminated with white light, its color is yellow. It has much the same color no matter what direction it is being observed from, although for some directions a highlight—a brighter area that is more white than yellow—is visible. In contrast, the light reflected from a point in a mirror depends almost entirely on the viewing direction. At a fixed point on the mirror, as the viewing angle changes, the object that is reflected in the mirror changes accordingly.
Reflection from translucent surfaces is more complex; a variety of materials ranging from skin and leaves to wax and liquids exhibit subsurface light transport, where light that enters the surface at one point exits it some distance away. (Consider, for example, how shining a flashlight in one’s mouth makes one’s cheeks light up, as light that enters the inside of the cheeks passes through the skin and exits the face.)
There are two abstractions for describing these mechanisms for light reflection: the BRDF and the BSSRDF, described in Sections 4.3.1 and 4.3.2, respectively. The BRDF describes surface reflection at a point neglecting the effect of subsurface light transport. For materials where this transport mechanism does not have a significant effect, this simplification introduces little error and makes the implementation of rendering algorithms much more efficient. The BSSRDF generalizes the BRDF and describes the more general setting of light reflection from translucent materials.
The bidirectional reflectance distribution function (BRDF) gives a formalism for describing reflection from a surface. Consider the setting in Figure 4.10: we would like to know how much radiance is leaving the surface in the direction ωo toward the viewer, Lo(p, ωo), as a result of incident radiance along the direction ωi, Li(p, ωi). (When considering light scattering at a surface location, pbrt uses the convention that ωi refers to the direction from which the quantity of interest (radiance in this case) arrives, rather than the direction from which the Integrator reached the surface.)
If the direction ωi is considered as a differential cone of directions, the differential irradiance at p is
A differential amount of radiance will be reflected in the direction ωo due to this irradiance. Because of the linearity assumption from geometric optics, the reflected differential radiance is proportional to the irradiance
dLo(p, ωo) ∝ dE(p, ωi).
The constant of proportionality defines the surface’s BRDF fr for the particular pair of directions ωi and ωo:
The spectral BRDF is defined by using spectral radiance in place of radiance.
Figure 4.10: The BRDF. The bidirectional reflectance distribution function is a 4D function over pairs of directions ωi and ωo that describes how much incident light along ωi is scattered from the surface in the direction ωo.
Physically based BRDFs have two important qualities:
Note that the value of the BRDF for a pair of directions ωi and ωo is not necessarily less than 1; it is only its integral that has this normalization constraint.
Two quantities that are based on the BRDF will occasionally be useful. First, the hemispherical-directional reflectance is a 2D function that gives the total reflection in a given direction due to constant illumination over the hemisphere, or, equivalently, the total reflection over the hemisphere due to light from a given direction.4 It is defined as
The hemispherical-hemispherical reflectance of a BRDF, denoted by ρhh, gives the fraction of incident light reflected by a surface when the incident light is the same from all directions. It is
A surface’s bidirectional transmittance distribution function (BTDF), which describes the distribution of transmitted light, can be defined in a manner similar to that for the BRDF. The BTDF is generally denoted by ft(p, ωo, ωi), where ωi and ωo are in opposite hemispheres around p. Remarkably, the BTDF does not obey reciprocity as defined above; we will discuss this issue in detail in Section 9.5.2.
For convenience in equations, we will denote the BRDF and BTDF when considered together as f (p, ωo, ωi); we will call this the bidirectional scattering distribution function (BSDF). Chapter 9 is entirely devoted to describing a variety of BSDFs that are useful for rendering.
Using the definition of the BSDF, we have
dLo(p, ωo) = f (p, ωo, ωi) Li(p, ωi) |cos θi| dωi.
Here an absolute value has been added to the cos θi factor. This is done because surface normals in pbrt are not reoriented to lie on the same side of the surface as ωi (many other rendering systems do this, although we find it more useful to leave them in their natural orientation as given by the Shape). Doing so makes it easier to consistently apply conventions like “the surface normal is assumed to point outside the surface” elsewhere in the system. Thus, applying the absolute value to cos θ factors like these ensures that the desired quantity is calculated.
We can integrate this equation over the sphere of incident directions around p to compute the outgoing radiance in direction ωo due to the incident illumination at p from all directions:
Shape 261
This is a fundamental equation in rendering; it describes how an incident distribution of light at a point is transformed into an outgoing distribution, based on the scattering properties of the surface. It is often called the scattering equation when the sphere S2 is the domain (as it is here), or the reflection equation when just the upper hemisphere ℌ2(n) is being integrated over. One of the key tasks of the integration routines in Chapters 13 through 15 is to evaluate this integral at points on surfaces in the scene.
The bidirectional scattering surface reflectance distribution function (BSSRDF) is the formalism that describes scattering from materials that exhibit subsurface light transport. It is a distribution function S(po, ωo, pi, ωi) that describes the ratio of exitant differential radiance at point po in direction ωo to the incident differential flux at pi from direction ωi (Figure 4.11):
The generalization of the scattering equation for the BSSRDF requires integration over surface area and incoming direction, turning the 2D scattering Equation (4.14) into a 4D integral.
With two more dimensions to integrate over, it is more complex to account for in rendering algorithms than Equation (4.14) is. However, as the distance between points pi and po increases, the value of S generally diminishes. This fact can be a substantial help in implementations of subsurface scattering algorithms.
Light transport beneath a surface is described by the same principles as volume light transport in participating media and is described by the equation of transfer, which is introduced in Section 14.1. Subsurface scattering is thus based on the same effects as light scattering in clouds and smoke—just at a smaller scale.
Figure 4.11: The bidirectional scattering surface reflectance distribution function generalizes the BSDF to account for light that exits the surface at a point other than where it enters. It is often more difficult to evaluate than the BSDF, although subsurface light transport makes a substantial contribution to the appearance of many real-world objects.
The atoms of an object with temperature above absolute zero are moving. In turn, as described by Maxwell’s equations, the motion of atomic particles that hold electrical charges causes objects to emit electromagnetic radiation over a range of wavelengths. As we will see shortly, at room temperature most of the emission is at infrared frequencies; objects need to be much warmer to emit meaningful amounts of electromagnetic radiation at visible frequencies.
Many different types of light sources have been invented to convert energy into emitted electromagnetic radiation. An object that emits light is called a lamp or an illuminant, though we avoid the latter terminology since we generally use “illuminant” to refer to a spectral distribution of emission (Section 4.4.2). A lamp is housed in a luminaire, which consists of all the objects that hold and protect the light as well as any objects like reflectors or diffusers that shape the distribution of light.
Understanding some of the physical processes involved in emission is helpful for accurately modeling light sources for rendering. A number of corresponding types of lamps are in wide use today:
For all of these sources, the underlying physical process is electrons colliding with atoms, which pushes their outer electrons to a higher energy level. When such an electron returns to a lower energy level, a photon is emitted. There are many other interesting processes that create light, including chemoluminescence (as seen in light sticks) and bioluminescence—a form of chemoluminescence seen in fireflies. Though interesting in their own right, we will not consider their mechanisms further here.
Luminous efficacy measures how effectively a light source converts power to visible illumination, accounting for the fact that for human observers, emission in non-visible wavelengths is of little value. Interestingly enough, it is the ratio of a photometric quantity (the emitted luminous flux) to a radiometric quantity (either the total power it uses or the total power that it emits over all wavelengths, measured in flux):
where V (λ) is the spectral response curve that was introduced in Section 4.1.4.
Luminous efficacy has units of lumens per watt. If Φi is the power consumed by the light source (rather than the emitted power), then luminous efficacy also incorporates a measure of how effectively the light source converts power to electromagnetic radiation. Luminous efficacy can also be defined as a ratio of luminous exitance (the photometric equivalent of radiant exitance) to irradiance at a point on a surface, or as the ratio of exitant luminance to radiance at a point on a surface in a particular direction.
A typical value of luminous efficacy for an incandescent tungsten lightbulb is around 15 lm/W. The highest value it can possibly have is 683, for a perfectly efficient light source that emits all of its light at λ = 555 nm, the peak of the V (λ) function. (While such a light would have high efficacy, it would not necessarily be a pleasant one as far as human observers are concerned.)
A blackbody is a perfect emitter: it converts power to electromagnetic radiation as efficiently as physically possible. While true blackbodies are not physically realizable, some emitters exhibit near-blackbody behavior. Blackbodies also have a useful closed-form expression for their emission by wavelength as a function of temperature that is useful for modeling non-blackbody emitters.
Blackbodies are so-named because they absorb absolutely all incident power, reflecting none of it. Intuitively, the reasons that perfect absorbers are also perfect emitters stem from the fact that absorption is the reverse operation of emission. Thus, if time was reversed, all the perfectly absorbed power would be perfectly efficiently re-emitted.
Planck’s law gives the radiance emitted by a blackbody as a function of wavelength λ and temperature T measured in kelvins:
where c is the speed of light in the medium (299,792,458 m/s in a vacuum), h is Planck’s constant, 6.62606957 × 10−34 J s, and kb is the Boltzmann constant, 1.3806488 × 10−23 J/K, where kelvin (K) is the unit of temperature. Blackbody emitters are perfectly diffuse; they emit radiance equally in all directions.
Figure 4.12 plots the emitted radiance distributions of a blackbody for a number of temperatures.
The Blackbody() function computes emitted radiance at the given temperature T in Kelvin for the given wavelength lambda.
Figure 4.12: Plots of emitted radiance as a function of wavelength for blackbody emitters at a few temperatures, as given by Equation (4.17). Note that as temperature increases, more of the emitted light is in the visible frequencies (roughly 380 nm–780 nm) and that the spectral distribution shifts from reddish colors to bluish colors. The total amount of emitted energy grows quickly as temperature increases, as described by the Stefan–Boltzmann law in Equation (4.19).
〈Spectrum Function Declarations〉 ≡
Float Blackbody(Float lambda, Float T) {
if (T <= 0) return 0;
const Float c = 299792458.f;
const Float h = 6.62606957e-34f; const Float kb = 1.3806488e-23f;
〈Return emitted radiance for blackbody at wavelength lambda 162〉
}
The wavelength passed to Blackbody() is in nm, but the constants for Equation (4.17) are in terms of meters. Therefore, it is necessary to first convert the wavelength to meters by scaling it by 10−9.
〈Return emitted radiance for blackbody at wavelength lambda〉 ≡ Float l = lambda * 1e-9f; Float Le = (2 * h * c * c) / (Pow<5>(l) * (FastExp((h * c) / (l * kb * T)) - 1)); return Le; |
162 |
The emission of non-blackbodies is described by Kirchhoff’s law, which says that the emitted radiance distribution at any frequency is equal to the emission of a perfect blackbody at that frequency times the fraction of incident radiance at that frequency that is absorbed by the object. (This relationship follows from the object being assumed to be in thermal equilibrium.) The fraction of radiance absorbed is equal to 1 minus the amount reflected, and so the emitted radiance is
where Le(T, λ) is the emitted radiance given by Planck’s law, Equation (4.17), and ρhd(ω) is the hemispherical-directional reflectance from Equation (4.12).
FastExp() 1036
Float 23
Pow() 1034
The Stefan–Boltzmann law gives the radiant exitance (recall that this is the outgoing irradiance) at a point p for a blackbody emitter:
where σ is the Stefan–Boltzmann constant, 5.67032 × 10−8 W m−2 K−4. Note that the total emission over all frequencies grows very rapidly—at the rate T4. Thus, doubling the temperature of a blackbody emitter increases the total energy emitted by a factor of 16.
The blackbody emission distribution provides a useful metric for describing the emission characteristics of non-blackbody emitters through the notion of color temperature. If the shape of the emitted spectral distribution of an emitter is similar to the blackbody distribution at some temperature, then we can say that the emitter has the corresponding color temperature. One approach to find color temperature is to take the wavelength where the light’s emission is highest and find the corresponding temperature using Wien’s displacement law, which gives the wavelength where emission of a blackbody is maximum given its temperature:
where b is Wien’s displacement constant, 2.8977721 × 10−3 m K.
Incandescent tungsten lamps are generally around 2700 K color temperature, and tungsten halogen lamps are around 3000 K. Fluorescent lights may range all the way from 2700 K to 6500 K. Generally speaking, color temperatures over 5000 K are described as “cool,” while 2700–3000 K is described as “warm.”
Another useful way of categorizing light emission distributions is a number of “standard illuminants” that have been defined by Commission Internationale de l’Éclairage (CIE).
The Standard Illuminant A was introduced in 1931 and was intended to represent average incandescent light. It corresponds to a blackbody radiator of about 2856 K. (It was originally defined as a blackbody at 2850 K, but the accuracy of the constants used in Planck’s law subsequently improved. Therefore, the specification was updated to be in terms of the 1931 constants, so that the illuminant was unchanged.) Figure 4.13 shows a plot of the spectral distribution of the A illuminant.
(The B and C illuminants were intended to model daylight at two times of day and were generated with an A illuminant in combination with specific filters. They are no longer used.
Figure 4.13: Plot of the CIE Standard Illuminant A’s Spectral Power Distribution as a Function of Wavelength in nm. This illuminant represents incandescent illumination and is close to a blackbody at 2856 K.
Figure 4.14: Plot of the CIE Standard D65 Illuminant Spectral Distribution as a Function of Wavelength in nm. This illuminant represents noontime daylight at European latitudes and is commonly used to define the whitepoint of color spaces (Section 4.6.3).
Figure 4.15: Plots of the F4 and F9 Standard Illuminants as a Function of Wavelength in nm. These represent two fluorescent lights. Note that the distributions are quite different. Spikes in the two distributions correspond to the wavelengths directly emitted by atoms in the gas, while the other wavelengths are generated by the bulb’s fluorescent coating. The F9 illuminant is a “broadband” emitter that uses multiple phosphors to achieve a more uniform spectral distribution.
The E illuminant is defined as having a constant spectral distribution and is used only for comparisons to other illuminants.)
The D illuminant describes various phases of daylight. It was defined based on characteristic vector analysis of a variety of daylight spectra, which made it possible to express daylight in terms of a linear combination of three terms (one fixed and two weighted), with one weight essentially corresponding to yellow-blue color change due to cloudiness and the other corresponding to pink-green due to water in the atmosphere (from haze, etc.). D65 is roughly 6504 K color temperature (not 6500 K—again due to changes in the values used for the constants in Planck’s law) and is intended to correspond to mid-day sunlight in Europe. (See Figure 4.14.) The CIE recommends that this illuminant be used for daylight unless there is a specific reason not to.
Finally, the F series of illuminants describes fluorescents; it is based on measurements of a number of actual fluorescent lights. Figure 4.15 shows the spectral distributions of two of them.
Figure 4.16: Spectral Distribution of Reflection from Lemon Skin.
4.5 REPRESENTING SPECTRAL DISTRIBUTIONS
Spectral distributions in the real world can be complex; we have already seen a variety of complex emission spectra and Figure 4.16 shows a graph of the spectral distribution of the reflectance of lemon skin. In order to render images of scenes that include a variety of complex spectra, a renderer must have efficient and accurate representations of spectral distributions. This section will introduce pbrt’s abstractions for representing and performing computation with them; the corresponding code can be found in the files util/spectrum.h and util/spectrum.cpp.
We will start by defining constants that give the range of visible wavelengths. Both here and for the remainder of the spectral code in pbrt, wavelengths are specified in nanometers, which are of a magnitude that gives easily human-readable values for the visible wavelengths.
〈Spectrum Constants〉 ≡
constexpr Float Lambda_min = 360, Lambda_max = 830;
We will find a variety of spectral representations useful in pbrt, ranging from spectral sample values tabularized by wavelength to functional descriptions such as the blackbody function. This brings us to our first interface class, Spectrum. A Spectrum corresponds to a pointer to a class that implements one such spectral representation.
BlackbodySpectrum 169
ConstantSpectrum 167
DenselySampledSpectrum 167
Float 23
PiecewiseLinearSpectrum 168
RGBAlbedoSpectrum 197
RGBIlluminantSpectrum 199
RGBUnboundedSpectrum 198
Spectrum 165
TaggedPointer 1073
Spectrum inherits from TaggedPointer, which handles the details of runtime polymorphism. TaggedPointer requires that all the types of Spectrum implementations be provided as template parameters, which allows it to associate a unique integer identifier with each type. (See Section B.4.4 for details of its implementation.)
〈Spectrum Definition〉 ≡
class Spectrum
: public TaggedPointer<ConstantSpectrum, DenselySampledSpectrum,
PiecewiseLinearSpectrum, RGBAlbedoSpectrum,
RGBUnboundedSpectrum, RGBIlluminantSpectrum,
BlackbodySpectrum> {
public:
〈Spectrum Interface 166〉
};
As with other classes that are based on TaggedPointer, Spectrum defines an interface that must be implemented by all the spectral representations. Typical practice in C++ would be for such an interface to be specified by pure virtual methods in Spectrum and for Spectrum implementations to inherit from Spectrum and implement those methods. With the TaggedPointer approach, the interface is specified implicitly: for each method in the interface, there is a method in Spectrum that dispatches calls to the appropriate type’s implementation. We will discuss the details of how this works for a single method here but will omit them for other Spectrum methods and for other interface classes since they all follow the same boilerplate.
The most important method that Spectrum defines is operator(), which takes a single wavelength λ and returns the value of the spectral distribution for that wavelength.
〈Spectrum Interface〉 ≡ Float operator()(Float lambda) const; |
165 |
The corresponding method implementation is brief, though dense. A call to TaggedPointer:: Dispatch() begins the process of dispatching the method call. The TaggedPointer class stores an integer tag along with the object’s pointer that encodes its type; in turn, Dispatch() is able to determine the specific type of the pointer at runtime. It then calls the callback function provided to it with a pointer to the object, cast to be a pointer to its actual type.
The lambda function that is called here, op, takes a pointer with the auto type specifier for its parameter. In C++17, such a lambda function acts as a templated function; a call to it with a concrete type acts as an instantiation of a lambda that takes that type. Thus, the call (*ptr)(lambda) in the lambda body ends up as a direct call to the appropriate method.
〈Spectrum Inline Method Definitions〉 ≡
inline Float Spectrum::operator()(Float lambda) const {
auto op = [&](auto ptr) { return (*ptr)(lambda); };
return Dispatch(op);
}
Spectrum implementations must also provide a MaxValue() method that returns a bound on the maximum value of the spectral distribution over its wavelength range. This method’s main use in pbrt is for computing bounds on the power emitted by light sources so that lights can be sampled according to their expected contribution to illumination in the scene.
〈Spectrum Interface〉 +≡ Float MaxValue() const; |
165 |
4.5.2 GENERAL SPECTRAL DISTRIBUTIONS
With the Spectrum interface specified, we will start by defining a few Spectrum class implementations that explicitly tabularize values of the spectral distribution function. Constant Spectrum is the simplest: it represents a constant spectral distribution over all wavelengths. The most common use of the ConstantSpectrum class in pbrt is to define a zero-valued spectral distribution in cases where a particular form of scattering is not present.
ConstantSpectrum 167
Float 23
Spectrum 165
TaggedPointer 1073
TaggedPointer::Dispatch() 1075
The ConstantSpectrum implementation is straightforward and we omit its trivial MaxValue() method here. Note that it does not inherit from Spectrum. This is another difference from using traditional C++ abstract base classes with virtual functions—as far as the C++ type system is concerned, there is no explicit connection between ConstantSpectrum and Spectrum.
〈Spectrum Definitions〉 ≡
class ConstantSpectrum {
public:
ConstantSpectrum(Float c) : c(c) {}
Float operator()(Float lambda) const { return c; }
private:
Float c;
};
More expressive is DenselySampledSpectrum, which stores a spectral distribution sampled at 1 nm intervals over a given range of integer wavelengths [λmin, λmax].
〈Spectrum Definitions〉 +≡
class DenselySampledSpectrum {
public:
〈DenselySampledSpectrum Public Methods 167〉
private:
〈DenselySampledSpectrum Private Members 167〉
};
Its constructor takes another Spectrum and evaluates that spectral distribution at each wavelength in the range. DenselySampledSpectrum can be useful if the provided spectral distribution is computationally expensive to evaluate, as it allows subsequent evaluations to be performed by reading a single value from memory.
〈DenselySampledSpectrum Public Methods〉 ≡ DenselySampledSpectrum(Spectrum spec, int lambda_min = Lambda_min, int lambda_max = Lambda_max, Allocator alloc = {}) : lambda_min(lambda_min), lambda_max(lambda_max), values(lambda_max - lambda_min + 1, alloc) { if (spec) for (int lambda = lambda_min; lambda <= lambda_max; ++lambda) values[lambda - lambda_min] = spec(lambda); } |
167 |
〈DenselySampledSpectrum Private Members〉 ≡ int lambda_min, lambda_max; pstd::vector<Float> values; |
167 |
Finding the spectrum’s value for a given wavelength lambda is a matter of returning zero for wavelengths outside of the valid range and indexing into the stored values otherwise.
〈DenselySampledSpectrum Public Methods〉 +≡ Float operator()(Float lambda) const { int offset = std::lround(lambda) - lambda_min; if (offset < 0 || offset >= values.size()) return 0; return values[offset]; } |
167 |
Allocator 40
ConstantSpectrum 167
ConstantSpectrum::c 167
DenselySampledSpectrum 167
DenselySampledSpectrum::lambda_max 167
DenselySampledSpectrum::lambda_min 167
DenselySampledSpectrum::values 167
Float 23
Lambda_max 165
Lambda_min 165
Spectrum 165
Spectrum::operator() 166
While sampling a spectral distribution at 1 nm wavelengths gives sufficient accuracy for most uses in rendering, doing so requires nearly 2 kB of memory to store a distribution that covers the visible wavelengths. PiecewiseLinearSpectrum offers another representation that is often more compact; its distribution is specified by a set of pairs of values (λi, vi) where the spectral distribution is defined by linearly interpolating between them; see Figure 4.17. For spectra that are smooth in some regions and change rapidly in others, this representation makes it possible to specify the distribution at a higher rate in regions where its variation is greatest.
Figure 4.17: PiecewiseLinearSpectrum defines a spectral distribution using a set of sample values (λi, vi). A continuous distribution is then defined by linearly interpolating between them.
〈Spectrum Definitions〉 +≡
class PiecewiseLinearSpectrum {
public:
〈PiecewiseLinearSpectrum Public Methods 168〉
private:
〈PiecewiseLinearSpectrum Private Members 168〉
};
The PiecewiseLinearSpectrum constructor, not included here, checks that the provided lambda values are sorted and then stores them and the associated spectrum values in corresponding member variables.
〈PiecewiseLinearSpectrum Public Methods〉 ≡ PiecewiseLinearSpectrum(pstd::span<const Float> lambdas, pstd::span<const Float> values, Allocator alloc = {}); |
168 |
〈PiecewiseLinearSpectrum Private Members〉 ≡ pstd::vector<Float> lambdas, values; |
168 |
Finding the value for a given wavelength requires first finding the pair of values in the lambdas array that bracket it and then linearly interpolating between them.
〈Spectrum Method Definitions〉 ≡
Float PiecewiseLinearSpectrum::operator()(Float lambda) const {
〈Handle PiecewiseLinearSpectrum corner cases 168〉
〈Find offset to largest lambdas below lambda and interpolate 169〉
}
Allocator 40
DenselySampledSpectrum 167
Float 23
PiecewiseLinearSpectrum 168
PiecewiseLinearSpectrum:: lambdas 168
As with DenselySampledSpectrum, wavelengths outside of the specified range are given a value of zero.
〈Handle PiecewiseLinearSpectrum corner cases〉 ≡ if (lambdas.empty() || lambda < lambdas.front() || lambda > lambdas.back()) return 0; |
168 |
If lambda is in range, then FindInterval() gives the offset to the largest value of lambdas that is less than or equal to lambda. In turn, lambda’s offset between that wavelength and the next gives the linear interpolation parameter to use with the stored values.
〈Find offset to largest lambdas below lambda and interpolate〉 ≡ int o = FindInterval(lambdas.size(), [&](int i) { return lambdas[i] <= lambda; }); Float t = (lambda - lambdas[o]) / (lambdas[o + 1] - lambdas[o]); return Lerp(t, values[o], values[o + 1]); |
The maximum value of the distribution is easily found using std::max_element(), which performs a linear search. This function is not currently called in any performance-sensitive parts of pbrt; if it was, it would likely be worth caching this value to avoid recomputing it.
〈Spectrum Method Definitions〉 +≡
Float PiecewiseLinearSpectrum::MaxValue() const {
if (values.empty()) return 0;
return *std::max_element(values.begin(), values.end());
}
Another useful Spectrum implementation, BlackbodySpectrum, gives the spectral distribution of a blackbody emitter at a specified temperature.
〈Spectrum Definitions〉 +≡
class BlackbodySpectrum {
public:
〈BlackbodySpectrum Public Methods 169〉
private:
〈BlackbodySpectrum Private Members 169 〉
};
The temperature of the blackbody in Kelvin is the constructor’s only parameter.
〈BlackbodySpectrum Public Methods〉 ≡ BlackbodySpectrum(Float T) : T(T) { 〈Compute blackbody normalization constant for given temperature 169〉 } |
169 |
〈BlackbodySpectrum Private Members〉 ≡ Float T; |
169 |
Blackbody() 162
BlackbodySpectrum 169
FindInterval() 1039
Float 23
Lerp() 72
PiecewiseLinearSpectrum:: lambdas 168
PiecewiseLinearSpectrum:: values 168
Spectrum 165
Because the power emitted by a blackbody grows so quickly with temperature (recall the Stefan–Boltzmann law, Equation (4.19)), the BlackbodySpectrum represents a normalized blackbody spectral distribution where the maximum value at any wavelength is 1. Wien’s displacement law, Equation (4.20), gives the wavelength in meters where emitted radiance is at its maximum; we must convert this value to nm before calling Blackbody() to find the corresponding radiance value.
〈Compute blackbody normalization constant for given temperature〉 ≡ Float lambdaMax = 2.8977721e-3f / T; normalizationFactor = 1 / Blackbody(lambdaMax * 1e9f, T); |
169 |
〈BlackbodySpectrum Private Members〉 +≡ Float normalizationFactor; |
169 |
The method that returns the value of the distribution at a wavelength then returns the product of the value returned by Blackbody() and the normalization factor.
〈BlackbodySpectrum Public Methods〉 +≡ Float operator()(Float lambda) const { return Blackbody(lambda, T) * normalizationFactor; } |
169 |
pbrt’s scene description format provides multiple ways to specify spectral data, ranging from blackbody temperatures to arrays of λ-value pairs to specify a piecewise-linear spectrum. For convenience, a variety of useful spectral distributions are also embedded directly in the pbrt binary, including ones that describe the emission profiles of various types of light source, the scattering properties of various conductors, and the wavelength-dependent indices of refraction of various types of glass. See the online pbrt file format documentation for a list of all of them.
The GetNamedSpectrum() function searches through these spectra and returns a Spectrum corresponding to a given named spectrum if it is available.
〈Spectral Function Declarations〉 ≡
Spectrum GetNamedSpectrum(std::string name);
A number of important spectra are made available directly through corresponding functions, all of which are in a Spectra namespace. Among them are Spectra::X(), Spectra::Y(), and Spectra::Z(), which return the color matching curves that are described in Section 4.6.1, and Spectra::D(), which returns a DenselySampledSpectrum representing the D illuminant at the given temperature.
〈Spectrum Function Declarations〉 +≡
DenselySampledSpectrum D(Float T, Allocator alloc);
4.5.4 SAMPLED SPECTRAL DISTRIBUTIONS
The attentive reader may have noticed that although Spectrum makes it possible to evaluate spectral distribution functions, it does not provide the ability to do very much computation with them other than sampling their value at a specified wavelength. Yet, for example, evaluating the integrand of the reflection equation, (4.14), requires taking the product of two spectral distributions, one for the BSDF and one for the incident radiance function.
Providing this functionality with the abstractions that have been introduced so far would quickly become unwieldy. For example, while the product of two DenselySampledSpectrums could be faithfully represented by another DenselySampledSpectrum, consider taking the product of two PiecewiseLinearSpectrums: the resulting function would be piecewise-quadratic and subsequent products would only increase its degree. Further, operations between Spectrum implementations of different types would not only require a custom implementation for each pair, but would require choosing a suitable Spectrum representation for each result.
pbrt avoids this complexity by performing spectral calculations at a set of discrete wavelengths as part of the Monte Carlo integration that is already being performed for image synthesis. To understand how this works, consider computing the (non-spectral) irradiance at some point p with surface normal n over some range of wavelengths of interest, [λ0, λ1]. Using Equation (4.7), which expresses irradiance in terms of incident radiance, and Equation (4.5), which expresses radiance in terms of spectral radiance, we have
where Li(p, ω, λ) is the incident spectral radiance at wavelength λ.
Allocator 40
Blackbody() 162
BlackbodySpectrum:: normalizationFactor 169
DenselySampledSpectrum 167
Float 23
PiecewiseLinearSpectrum 168
Spectrum 165
Applying the standard Monte Carlo estimator and taking advantage of the fact that ω and λ are independent, we can see that estimates of E can be computed by sampling directions ωi from some distribution pω, wavelengths λi from some distribution pλ, and then evaluating:
Thus, we only need to be able to evaluate the integrand at the specified discrete wavelengths to estimate the irradiance. More generally, we will see that it is possible to express all the spectral quantities that pbrt outputs as integrals over wavelength. For example, Section 4.6 shows that when rendering an image represented using RGB colors, each pixel’s color can be computed by integrating the spectral radiance arriving at a pixel with functions that model red, green, and blue color response. pbrt therefore uses only discrete spectral samples for spectral computation.
So that we can proceed to the implementation of the classes related to sampling spectra and performing computations with spectral samples, we will define the constant that sets the number of spectral samples here. (Section 4.6.5 will discuss in more detail the trade-offs involved in choosing this value.) pbrt uses 4 wavelength samples by default; this value can easily be changed, though doing so requires recompiling the system.
〈Spectrum Constants〉 +≡
static constexpr int NSpectrumSamples = 4;
SampledSpectrum
The SampledSpectrum class stores an array of NSpectrumSamples values that represent values of the spectral distribution at discrete wavelengths. It provides methods that allow a variety of mathematical operations to be performed with them.
〈SampledSpectrum Definition〉 ≡
class SampledSpectrum {
public:
〈SampledSpectrum Public Methods 171〉
private:
pstd::array<Float, NSpectrumSamples> values;
};
Its constructors include one that allows providing a single value for all wavelengths and one that takes an appropriately sized pstd::span of per-wavelength values.
〈SampledSpectrum Public Methods〉 ≡ explicit SampledSpectrum(Float c) { values.fill(c); } SampledSpectrum(pstd::span<const Float> v) { for (int i = 0; i < NSpectrumSamples; ++i) values[i] = v[i]; } |
171 |
The usual indexing operations are also provided for accessing and setting each wavelength’s value.
Float 23
NSpectrumSamples 171
SampledSpectrum 171
SampledSpectrum::values 171
〈SampledSpectrum Public Methods〉 +≡ Float operator[](int i) const { return values[i]; } Float &operator[](int i) { return values[i]; } |
171 |
It is often useful to know if all the values in a SampledSpectrum are zero. For example, if a surface has zero reflectance, then the light transport routines can avoid the computational cost of casting reflection rays that have contributions that would eventually be multiplied by zeros. This capability is provided through a type conversion operator to bool.5
〈SampledSpectrum Public Methods〉 +≡ explicit operator bool() const { for (int i = 0; i < NSpectrumSamples; ++i) if (values[i] != 0) return true; return false; } |
171 |
All the standard arithmetic operations on SampledSpectrum objects are provided; each operates component-wise on the stored values. The implementation of operator+= is below. The others are analogous and are therefore not included in the text.
〈SampledSpectrum Public Methods〉 +≡ SampledSpectrum &operator+=(const SampledSpectrum &s) { for (int i = 0; i < NSpectrumSamples; ++i) values[i] += s.values[i]; return *this; } |
171 |
SafeDiv() divides two sampled spectra, but generates zero for any sample where the divisor is zero.
〈SampledSpectrum Inline Functions〉 ≡
SampledSpectrum SafeDiv(SampledSpectrum a, SampledSpectrum b) {
SampledSpectrum r;
for (int i = 0; i < NSpectrumSamples; ++i)
r[i] = (b[i] != 0) ? a[i] / b[i] : 0.;
return r;
}
In addition to the basic arithmetic operations, SampledSpectrum also provides Lerp(), Sqrt(), Clamp(), ClampZero(), Pow(), Exp(), and FastExp() functions that operate (again, component-wise) on SampledSpectrum objects; some of these operations are necessary for evaluating some of the reflection models in Chapter 9 and for evaluating volume scattering models in Chapter 14. Finally, MinComponentValue() and MaxComponentValue() return the minimum and maximum of all the values, and Average() returns their average. These methods are all straightforward and are therefore not included in the text.
SampledWavelengths
A separate class, SampledWavelengths, stores the wavelengths for which a SampledSpectrum stores samples. Thus, it is important not only to keep careful track of the SampledWavelengths that are represented by an individual SampledSpectrum but also to not perform any operations that combine SampledSpectrums that have samples at different wavelengths.
NSpectrumSamples 171
SampledSpectrum 171
SampledSpectrum::values 171
〈SampledWavelengths Definitions〉 ≡
class SampledWavelengths {
public:
〈SampledWavelengths Public Methods 173〉
private:
〈SampledWavelengths Private Members 173〉
};
To be used in the context of Monte Carlo integration, the wavelengths stored in Sampled Wavelengths must be sampled from some probability distribution. Therefore, the class stores the wavelengths themselves as well as each one’s probability density.
〈SampledWavelengths Private Members〉 ≡ pstd::array<Float, NSpectrumSamples> lambda, pdf; |
173 |
The easiest way to sample wavelengths is uniformly over a given range. This approach is implemented in the SampleUniform() method, which takes a single uniform sample u and a range of wavelengths.
〈SampledWavelengths Public Methods〉 ≡ static SampledWavelengths SampleUniform(Float u, Float lambda_min = Lambda_min, Float lambda_max = Lambda_max) { SampledWavelengths swl; 〈Sample first wavelength using u 173〉 〈Initialize lambda for remaining wavelengths 173〉 〈Compute PDF for sampled wavelengths 173〉 return swl; } |
173 |
It chooses the first wavelength uniformly within the range.
〈Sample first wavelength using u〉 ≡ swl.lambda[0] = Lerp(u, lambda_min, lambda_max); |
173 |
The remaining wavelengths are chosen by taking uniform steps delta starting from the first wavelength and wrapping around if lambda_max is passed. The result is a set of stratified wavelength samples that are generated using a single random number. One advantage of sampling wavelengths in this way rather than using a separate uniform sample for each one is that the value of NSpectrumSamples can be changed without requiring the modification of code that calls SampleUniform() to adjust the number of sample values that are passed to this method.
〈Initialize lambda for remaining wavelengths〉 ≡ Float delta = (lambda_max - lambda_min) / NSpectrumSamples; for (int i = 1; i < NSpectrumSamples; ++i) { swl.lambda[i] = swl.lambda[i - 1] + delta; if (swl.lambda[i] > lambda_max) swl.lambda[i] = lambda_min + (swl.lambda[i] - lambda_max); } |
173 |
The probability density for each sample is easily computed, since the sampling distribution is uniform.
Float 23
Lambda_max 165
Lambda_min 165
Lerp() 72
NSpectrumSamples 171
SampledWavelengths 173
SampledWavelengths::lambda 173
SampledWavelengths::pdf 173
〈Compute PDF for sampled wavelengths〉 ≡ for (int i = 0; i < NSpectrumSamples; ++i) swl.pdf[i] = 1 / (lambda_max - lambda_min); |
173 |
Additional methods provide access to the individual wavelengths and to all of their PDFs. PDF values are returned in the form of a SampledSpectrum, which makes it easy to compute the value of associated Monte Carlo estimators.
〈SampledWavelengths Public Methods〉 +≡ Float operator[](int i) const { return lambda[i]; } Float &operator[](int i) { return lambda[i]; } SampledSpectrum PDF() const { return SampledSpectrum(pdf); } |
173 |
In some cases, different wavelengths of light may follow different paths after a scattering event. The most common example is when light undergoes dispersion and different wavelengths of light refract to different directions. When this happens, it is no longer possible to track multiple wavelengths of light with a single ray. For this case, SampledWavelengths provides the capability of terminating all but one of the wavelengths; subsequent computations can then consider the single surviving wavelength exclusively.
〈SampledWavelengths Public Methods〉 +≡ void TerminateSecondary() { if (SecondaryTerminated()) return; 〈Update wavelength probabilities for termination 174〉 } |
173 |
The wavelength stored in lambda[0] is always the survivor: there is no need to randomly select the surviving wavelength so long as each lambda value was randomly sampled from the same distribution as is the case with SampleUniform(), for example. Note that this means that it would be incorrect for SampledWavelengths::SampleUniform() to always place lambda[0] in a first wavelength stratum between lambda_min and lambda_min+delta, lambda[1] in the second, and so forth.6
Terminated wavelengths have their PDF values set to zero; code that computes Monte Carlo estimates using SampledWavelengths must therefore detect this case and ignore terminated wavelengths accordingly. The surviving wavelength’s PDF is updated to account for the termination event by multiplying it by the probability of a wavelength surviving termination, 1 / NSpectrumSamples. (This is similar to how applying Russian roulette affects the Monte Carlo estimator—see Section 2.2.4.)
〈Update wavelength probabilities for termination〉 ≡ for (int i = 1; i < NSpectrumSamples; ++i) pdf[i] = 0; pdf[0] /= NSpectrumSamples; |
174 |
SecondaryTerminated() indicates whether TerminateSecondary() has already been called. Because path termination is the only thing that causes zero-valued PDFs after the first wavelength, checking the PDF values suffices for this test.
〈SampledWavelengths Public Methods〉 +≡ bool SecondaryTerminated() const { for (int i = 1; i < NSpectrumSamples; ++i) if (pdf[i] != 0) return false; return true; } |
173 |
Float 23
NSpectrumSamples 171
SampledSpectrum 171
SampledWavelengths 173
SampledWavelengths::lambda 173
SampledWavelengths::pdf 173
SampledWavelengths:: SampleUniform() 173
SampledWavelengths:: SecondaryTerminated() 174
We will often have a Spectrum and a set of wavelengths for which we would like to evaluate it. Therefore, we will add a method to the Spectrum interface that provides a Sample() method that takes a set of wavelengths, evaluates its spectral distribution function at each one, and returns a SampledSpectrum. This convenience method eliminates the need for an explicit loop over wavelengths with individual calls to Spectrum::operator() in this common case. The implementations of this method are straightforward and not included here.
〈Spectrum Interface〉 +≡ SampledSpectrum Sample(const SampledWavelengths &lambda) const; |
165 |
Discussion
Now that SampledWavelengths and SampledSpectrum have been introduced, it is reasonable to ask the question: why are they separate classes, rather than a single class that stores both wavelengths and their sample values? Indeed, an advantage of such a design would be that it would be possible to detect at runtime if an operation was performed with two SampledSpectrum instances that stored values for different wavelengths—such an operation is nonsensical and would signify a bug in the system.
However, in practice many SampledSpectrum objects are created during rendering, many as temporary values in the course of evaluating expressions involving spectral computation. It is therefore worthwhile to minimize the object’s size, if only to avoid initialization and copying of additional data. While the pbrt’s CPU-based integrators do not store many SampledSpectrum values in memory at the same time, the GPU rendering path stores a few million of them, giving further motivation to minimize their size.
Our experience has been that bugs from mixing computations at different wavelengths have been rare. With the way that computation is structured in pbrt, wavelengths are generally sampled at the start of following a ray’s path through the scene, and then the same wavelengths are used throughout for all spectral calculations along the path. There ends up being little opportunity for inadvertent mingling of sampled wavelengths in SampledSpectrum instances. Indeed, in an earlier version of the system, SampledSpectrum did carry along a SampledWavelengths member variable in debug builds in order to be able to check for that case. It was eliminated in the interests of simplicity after a few months’ existence without finding a bug.
“Spectral distribution” and “color” might seem like two names for the same thing, but they are distinct. A spectral distribution is a purely physical concept, while color describes the human perception of a spectrum. Color is thus closely connected to the physiology of the human visual system and the brain’s processing of visual stimulus.
Although the majority of rendering computation in pbrt is based on spectral distributions, color still must be treated carefully. For example, the spectral distribution at each pixel in a rendered image must be converted to RGB color to be displayed on a monitor. Performing this conversion accurately requires using information about the monitor’s color characteristics. The renderer also finds color in scene descriptions that use it to describe reflectance and light emission. Although it is convenient for humans to use colors to describe the appearance of modeled scenes, these colors must be converted to spectra if a renderer uses spectral distributions in its light transport simulation. Unfortunately, doing so is an underspecified problem. A variety of approaches have been developed for it; the one implemented in pbrt is described in Section 4.6.6.
SampledSpectrum 171
SampledWavelengths 173
Spectrum 165
Spectrum::operator() 166
The tristimulus theory of color perception says that all visible spectral distributions can be accurately represented for human observers using three scalar values. Its basis is that there are three types of photoreceptive cone cells in the eye, each sensitive to different wavelengths of light. This theory, which has been tested in numerous experiments since its introduction in the 1800s, has led to the development of spectral matching functions, which are functions of wavelength that can be used to compute a tristimulus representation of a spectral distribution.
Integrating the product of a spectral distribution S(λ) with three tristimulus matching functions m{1,2,3}(λ) gives three tristimulus values vi:
The matching functions thus define a color space, which is a 3D vector space of the tristimulus values: the tristimulus values for the sum of two spectra are given by the sum of their tristimulus values and the tristimulus values associated with a spectrum that has been scaled by a constant can be found by scaling the tristimulus values by the same factor. Note that from these definitions, the tristimulus values for the product of two spectral distributions are not given by the product of their tristimulus values. This nit is why using tristimulus color like RGB for rendering may not give accurate results; we will say more about this topic in Section 4.6.6.
The files util/color.h and util/color.cpp in the pbrt distribution contain the implementation of the functionality related to color that is introduced in this section.
An important set of color matching functions were determined by the Commission Internationale de l’Éclairage (CIE) standards body after a series of experiments with human test subjects. They define the XYZ color space and are graphed in Figure 4.18. XYZ is a device-independent color space, which means that it does not describe the characteristics of a particular display or color measurement device.
Figure 4.18: The XYZ Color Matching Curves. A given spectral distribution can be converted to XYZ by multiplying it by each of the three matching curves and integrating the result to compute the values xλ, yλ, and zλ, using Equation (4.22).
Figure 4.19: Plot of XYZ color coefficients for the wavelengths of light in the visible range. The curve is shaded with the RGB color associated with each wavelength.
Given a spectral distribution S(λ), its XYZ color space coordinates xλ, yλ, and zλ are computed by integrating its product with the X(λ), Y (λ), and Z(λ) spectral matching curves:7
The CIE Y (λ) tristimulus curve was chosen to be proportional to the V (λ) spectral response curve used to define photometric quantities such as luminance in Equation (4.6). Their relationship is: V (λ) = 683 Y (λ).
Remarkably, spectra with substantially different distributions may have very similar xλ, yλ, and zλ values. To the human observer, such spectra appear the same. Pairs of such spectra are called metamers.
Figure 4.19 shows a 3D plot of the curve in the XYZ space corresponding to the XYZ coefficients for single wavelengths of light over the visible range. The coefficients for more complex spectral distributions therefore correspond to linear combinations of points along this curve. Although all spectral distributions can be represented with XYZ coefficients, not all values of XYZ coefficients correspond to realizable spectra; such sets of coefficients are termed imaginary colors.
Three functions in the Spectra namespace provide the CIE XYZ matching curves sampled at 1-nm increments from 360 nm to 830 nm.
〈Spectral Function Declarations〉 +≡
namespace Spectra {
const DenselySampledSpectrum &X();
const DenselySampledSpectrum &Y();
const DenselySampledSpectrum &Z();
}
The integral of Y (λ) is precomputed and available in a constant.
〈Spectrum Constants〉 +≡
static constexpr Float CIE_Y_integral = 106.856895;
There is also an XYZ class that represents XYZ colors.
〈XYZ Definition〉 ≡
class XYZ {
public:
〈XYZ Public Methods 178〉
〈XYZ Public Members 178〉
};
Its implementation is the obvious one, using three Float values to represent the three color components. All the regular arithmetic operations are provided for XYZ in methods that are not included in the text here.
〈XYZ Public Methods〉 ≡ XYZ(Float X, Float Y, Float Z) : X(X), Y(Y), Z(Z) {} |
178 |
〈XYZ Public Members〉 ≡ Float X = 0, Y = 0, Z = 0; |
178 |
The SpectrumToXYZ() function computes the XYZ coefficients of a spectral distribution following Equation (4.22) using the following InnerProduct() utility function to handle each component.
〈Spectrum Function Definitions〉 ≡
XYZ SpectrumToXYZ(Spectrum s) {
return XYZ(InnerProduct(&Spectra::X(), s),
InnerProduct(&Spectra::Y(), s),
InnerProduct(&Spectra::Z(), s)) / CIE_Y_integral;
}
Monte Carlo is not necessary for a simple 1D integral of two spectra, so InnerProduct() computes a Riemann sum over integer wavelengths instead:
〈Spectrum Inline Functions〉 ≡
Float InnerProduct(Spectrum f, Spectrum g) {
Float integral = 0;
for (Float lambda = Lambda_min; lambda <= Lambda_max; ++lambda)
integral += f(lambda) * g(lambda);
return integral;
}
CIE_Y_integral 178
DenselySampledSpectrum 167
Float 23
Lambda_max 165
Lambda_min 165
Spectra::X() 170
Spectra::Y() 170
Spectra::Z() 170
Spectrum 165
Spectrum::InnerProduct() 178
Spectrum::operator() 166
XYZ 178
It is also useful to be able to compute XYZ coefficients for a SampledSpectrum. Because SampledSpectrum only has point samples of the spectral distribution at predetermined wavelengths, they are found via a Monte Carlo estimate of Equation (4.22) using the sampled spectral values si at wavelengths λi and their associated PDFs:
and so forth, where n is the number of wavelength samples.
SampledSpectrum::ToXYZ() computes the value of this estimator.
〈Spectrum Method Definitions〉 +≡
XYZ SampledSpectrum::ToXYZ(const SampledWavelengths &lambda) const {
〈Sample the X, Y , and Z matching curves at lambda 179〉
〈Evaluate estimator to compute (x, y, z) coefficients 179〉
}
The first step is to sample the matching curves at the specified wavelengths.
〈Sample the X, Y , and Z matching curves at lambda〉 ≡ SampledSpectrum X = Spectra::X().Sample(lambda); SampledSpectrum Y = Spectra::Y().Sample(lambda); SampledSpectrum Z = Spectra::Z().Sample(lambda); |
179 |
The summand in Equation (4.23) is easily computed with values at hand. Here, we evaluate all terms of each sum with a single expression. Using SampledSpectrum::SafeDiv() to divide by the PDF values handles the case of the PDF being equal to zero for some wavelengths, as can happen if SampledWavelengths::TerminateSecondary() was called. Finally, SampledSpectrum::Average() conveniently takes care of summing the individual terms and dividing by n to compute the estimator’s value for each coefficient.
〈Evaluate estimator to compute (x, y, z) coefficients〉 ≡ SampledSpectrum pdf = lambda.PDF(); return XYZ(SafeDiv(X * *this, pdf).Average(), SafeDiv(Y * *this, pdf).Average(), SafeDiv(Z * *this, pdf).Average()) / CIE_Y_integral; |
179 |
To avoid the expense of computing the X and Z coefficients when only luminance is needed, there is a y() method that only returns Y. Its implementation is the obvious subset of XYZ() and so is not included here.
CIE_Y_integral 178
SampledSpectrum 171
SampledSpectrum::Average() 172
SampledSpectrum::SafeDiv() 172
SampledSpectrum::ToXYZ() 179
SampledWavelengths 173
SampledWavelengths::PDF() 174
SampledWavelengths:: TerminateSecondary() 174
Spectra::X() 170
Spectra::Y() 170
Spectra::Z() 170
Spectrum::Sample() 175
XYZ 178
Chromaticity and xyY Color
Color can be separated into lightness, which describes how bright it is relative to something white, and chroma, which describes its relative colorfulness with respect to white. One approach to quantifying chroma is the xyz chromaticity coordinates, which are defined in terms of XYZ color space coordinates by
Note that any two of them are sufficient to specify chromaticity.
Figure 4.20: xy Chromaticity Diagram. All valid colors lie inside the shaded region.
Considering just x and y, we can plot a chromaticity diagram to visualize their values; see Figure 4.20. Spectra with light at just a single wavelength—the pure spectral colors—lie along the curved part of the chromaticity diagram. This part corresponds to the xy projection of the 3D XYZ curve that was shown in Figure 4.19. All the valid colors lie inside the upside-down horseshoe shape; points outside that region correspond to imaginary colors.
The xyY color space separates a color’s chromaticity from its lightness. It uses the x and y chromaticity coordinates and yλ from XYZ, since the Y (λ) matching curve was defined to be proportional to luminance. pbrt makes limited use of xyY colors and therefore does not provide a class to represent them, but the XYZ class does provide a method that returns its xy chromaticity coordinates as a Point2f.
〈XYZ Public Methods〉 +≡ Point2f xy() const { return Point2f(X / (X + Y + Z), Y / (X + Y + Z)); } |
178 |
A corresponding method converts from xyY to XYZ, given xy and optionally yλ coordinates.
〈XYZ Public Methods〉 +≡
static XYZ FromxyY(Point2f xy, Float Y = 1) {
if (xy.y == 0)
return XYZ(0, 0, 0);
return XYZ(xy.x * Y / xy.y, Y, (1 - xy.x - xy.y) * Y / xy.y);
}
RGB color is used more commonly than XYZ in rendering applications. In RGB color spaces, colors are represented by a triplet of values corresponding to red, green, and blue colors, often referred to as RGB. However, an RGB triplet on its own is meaningless; it must be defined with respect to a specific RGB color space.
To understand why, consider what happens when an RGB color is shown on a display: the spectrum that is displayed is given by the weighted sum of three spectral emission curves, one for each of red, green, and blue, as emitted by the display elements, be they phosphors, LED or LCD elements, or plasma cells.8 Figure 4.21 plots the red, green, and blue distributions emitted by an LCD display and an LED display; note that they are remarkably different. Figure 4.22 in turn shows the spectral distributions that result from displaying the RGB color (0.6, 0.3, 0.2) on those displays. Not surprisingly, the resulting spectra are quite different as well.
Float 23
Point2f 92
XYZ 178
XYZ::X 178
XYZ::Y 178
XYZ::Z 178
Figure 4.21: Red, Green, and Blue Emission Curves for an LCD Display and an LED Display. The first plot shows the curves for an LCD display, and the second shows them for an LED. These two displays have quite different emission profiles. (Data courtesy of X-Rite, Inc.)
Figure 4.22: Spectral Distributions from Displaying the RGB Color (0.6, 0.3, 0.2) on LED (red) and LCD (blue) Displays. The resulting emitted distributions are remarkably different, even given the same RGB values, due to the different emission curves illustrated in Figure 4.21.
If a display’s R(λ), G(λ), and B(λ) curves are known, the RGB coefficients for displaying a spectral distribution S(λ) on that display can be found by integrating S(λ) with each curve:
and so forth. The same approaches that were used to compute XYZ values for spectra in the previous section can be used to compute the values of these integrals.
Alternatively, if we already have the (xλ, yλ, zλ) representation of S(λ), it is possible to convert the XYZ coefficients directly to corresponding RGB coefficients. Consider, for example, computing the value of the red component for a spectral distribution S(λ):
where the second step takes advantage of the tristimulus theory of color perception.
The integrals of the products of an RGB response function and XYZ matching function can be precomputed for given response curves, making it possible to express the full conversion as a matrix:
pbrt frequently uses this approach in order to efficiently convert colors from one color space to another.
An RGB class that has the obvious representation and provides a variety of useful arithmetic operations (not included in the text) is also provided by pbrt.
〈RGB Definition〉 ≡
class RGB {
public:
〈RGB Public Methods 182〉
〈RGB Public Members 182〉
};
〈RGB Public Methods〉 ≡ RGB(Float r, Float g, Float b) : r(r), g(g), b(b) {} |
182 |
〈RGB Public Members〉 ≡ Float r = 0, g = 0, b = 0; |
182 |
Full spectral response curves are not necessary to define color spaces. For example, a color space can be defined using xy chromaticity coordinates to specify three color primaries. From them, it is possible to derive matrices that convert XYZ colors to and from that color space. In cases where we do not otherwise need explicit spectral response curves, this is a convenient way to specify a color space.
The RGBColorSpace class, which is defined in the files util/colorspace.h and util/color space.cpp, uses this approach to encapsulate a representation of an RGB color space as well as a variety of useful operations like converting XYZ colors to and from its color space.
Float 23
RGB 182
〈RGBColorSpace Definition〉 ≡
class RGBColorSpace {
public:
〈RGBColorSpace Public Methods 184〉
private:
〈RGBColorSpace Private Members 184〉
};
An RGB color space is defined using the chromaticities of red, green, and blue color primaries. The primaries define the gamut of the color space, which is the set of colors it can represent with RGB values between 0 and 1. For three primaries, the gamut forms a triangle on the chromaticity diagram where each primary’s chromaticity defines one of the vertices.9
In addition to the primaries, it is necessary to specify the color space’s whitepoint, which is the color that is displayed when all three primaries are activated to their maximum emission. It may be surprising that this is necessary—after all, should not white correspond to a spectral distribution with the same value at every wavelength? White is, however, a color, and as a color it is what humans perceive as being uniform and label “white.” The spectra for white colors tend to have more power in the lower wavelengths that correspond to blues and greens than they do at higher wavelengths that correspond to oranges and reds. The D65 illuminant, which was described in Section 4.4.2 and plotted in Figure 4.14, is a common choice for specifying color spaces’ whitepoints.
While the chromaticities of the whitepoint are sufficient to define a color space, the RGBColor Space constructor takes its full spectral distribution, which is useful for forthcoming code that converts from color to spectral distributions. Storing the illuminant spectrum allows users of the renderer to specify emission from light sources using RGB color; the provided illuminant then gives the spectral distribution for RGB white, (1, 1, 1).
〈RGBColorSpace Method Definitions〉 ≡
RGBColorSpace::RGBColorSpace(Point2f r, Point2f g, Point2f b,
Spectrum illuminant, const RGBToSpectrumTable *rgbToSpec,
Allocator alloc)
: r(r), g(g), b(b), illuminant(illuminant, alloc),
rgbToSpectrumTable(rgbToSpec) {
〈Compute whitepoint primaries and XYZ coordinates 184〉
〈