User:Ankitm/GSoC 2020/Final report

Improving IO performance for big files - Final report

I wholeheartedly thank my mentors, Dr. Sybren Stüvel and Dr. Howard Trickey, for their patience, code review, and constant guidance. I also thank Jacques Lucke for reviewing the code throughout the project and extending functionalities in the master branch needed for the project.

Introduction

The project aimed at improving import/export time of 3D file formats: PLY and STL by rewriting them in C++. The goals were adjusted at the beginning of the program to prioritize Wavefront OBJ since it supports more features.

Status

Wavefront OBJ (and material library, MTL) exporter and importer work like a replacement of the older one. I could not work on PLY and STL, unfortunately. I will work on them in the coming months.

Code structure

Detailed information can be found in the design docs, so here's a broad overview of the code structure.
source/blender/editors/io_obj.c has the operator definition and file browser UI drawing code. The C-interface is in source/blender/io/wavefront/IO_wavefront_obj.h. The implementation is in the source/blender/io/wavefront/intern directory.

Comments

These are a few comments I thought are worth mentioning.

Significant speed gains came directly by virtue of language and making fewer memory allocations. However, some modifications that gave speedup were:

  • Use std::fprintf instead of std::ofstream to write the files. The former was consistently about 40-50% faster for writing the same file. reference: week 2
  • Fetching data at the time of writing, instead of storing it in lists (and duplicating the scene in memory). My inital concern was that adding more tasks in the way of writing could be slow. But profiling showed that the time was actually saved by removing the large memory allocations. reference: week 3
  • Use std::stof/ std::stoi for converting strings to numbers instead of >>. While the latter is convenient and safe, it's slow due to the same reasons. reference: week 7
  • Avoided BMesh. It's slightly easier to create vertices using a BMesh, but the downsides were: more memory usage, slow creation & slow conversion to Mesh. reference: week 7
  • Minimise string allocations and operations by using blender::StringRef instead of creating new strings (for e.g. in splitting a line into components). reference: week 10.

For profiling, Instruments.app was used. It comes bundled with Xcode and is easy to use. Other tools that I tried are: gperftools, dtrace with Flamegraph (script: P1161), py-spy for python scripts.

Advice I'd give to my past self

  • Try to make reasonable time estimates for milestones. At first, plans may go wrong, and one may find oneself not meeting deadlines. But with time, as one gets more familiar with the codebase, estimates improve.
  • Discuss the design before implementing it. Refactoring will be a time sink later on.

Comparisons

The following tables note the time taken by wm_fileselect_handler_do which is the last common function for both the old and new implementations before the code path diverges. The time taken by the user to select the file is not added to this function.
Enabled export settings: write materials, write normals, write UV coordinates.

Default cube + 9 subdivision
surface levels. (triangulated faces)
File Size (MB) Time Old (s) Time New (s) Speed-up
Import 365 101 15 6.7X
Export 365 132 14 9.4X
Default Cylinder: 2038 copies.
(non-triangulated faces)
File Size (MB) Time Old (s) Time New (s) Speed-up
Import 20 84.6 1.5 56x
Export 20 12.3 .9 13x

Future work

  • As mentioned above, PLY and STL are pending.
  • Currently the MTL exporter and importers have their classes for nodetree operations. The importer uses ShaderNodetreeWrap class to create a nodetree from materials in MTL file. The exporter uses MaterialWrap class to traverse an Object's nodetree and extract the data for MTL file. It would be good to extend them for any other writer/reader that operates on material data. On the python side, node_shader_utils.py provides such an utility.
  • All the three file formats have shared functionality that would be good to extract in source/blender/io/common to reduce code duplication. For e.g: exporters need to obtain object names, vertex coordinates, total number of elements etc. Similarly for the importer, if coordinates are stored in flat lists, mesh creation code would be the same.
  • ngon_tessellate, which was blatantly ported from mesh_utils.py to mesh_utils.cc, can be improved further to be generic. Currently, the way it accepts arguments and returns values is tightly governed by the OBJ importer.
  • Support more options for NURBS curves and surfaces. This is hindered in part by the lenient file format specification, e.g., some implementations use vp instead of v for vertex coordinates, curv2 instead of curv etc.