User:Yiming/Proposal 2018/UserDocument/Performance

Performance Issues With LANPR

Threading

Threading problem is mainly about software mode in LANPR, which the vector occlusion test is done using multiple threads on CPU.

Currently, LANPR uses Blender's threading config from render performance panel, by default the value should be detected automatically (For an example, on my i7-4910MQ, it gives a number of 8). There is still a minor problem about data conflict between threads that haven't been solved, so if the calculation gives weird result (like flickering or wrong occlusion), please set the thread number to one.

Because LANPR have a 2D acceleration structure, it will speed up the calculation even when only one thread is used. It is believed that LANPR will still easily beat Freestyle on speed even with one thread. So if your use case doesn't tolerant any occlusion calculation error, please use one thread calculation until this problem has been solved.

The problem description can be found in this Developemnt Notes.

GPU and Bus Bandwidth

A large part of LANPR's render is done by GPU, this includes all drawing commands and some calculation commands. So there's a serious amount of data traveling from client memory to VRam each time the drawing refreshes. For data heavy modes such as DPIX, a large VRam is expected(preferred 2GB+, and the more the better). Currently DPIX's GPU texture cache is fixed at 2048px size, which in total will consume ~700MB VRam uncompressed. In the future, this value will certainly become dynamic, but for big scenes, large VRam is still required.

Because of the data size sending to the GPU is usually very big, there will be a lag (sometimes for several seconds) before UI have responses and things can be drawn. With higher bandwidth on the motherboard bus and in your video card, the time can be greatly reduced. Driver can be a issue too, but the effect is unknown yet.

Memory Usages

The memory usage has become a big issue in Freestyle. Due to different calculation strategy, LANPR may or may not use less memory than Freestyle. The internal data structure has been simplified enough to reduce memory usage, but this can depend on the 3D and 2D projected structure of your scene (very similar to BVH tree stuff).

Another problem with memory usage is that LANPR uses double precision floating point numbers internally to meet the precision requirements of the algorithm. This may not easily be optimized.