Skip to main content

How to Copy

Ω Intro:

Since some time now the copy stamp node has been deprecated, yet observations show that a lot of people seem to be still using it: long-time houdini users just keep using it from old memories; and newcomers watch old tutorials and learn the old ways from the start.
So here's a little comparison of different copy-stamp methods, including old one and a couple of variations on how you can make it with new foreach loop and compiled blocks.
The setups will be available in the hip file included, and here we will just concentrate on tests and benchmark.

Test description:

Each test will have 2 parameters: the total number of template points to copy objects to, and the proportion of unique stamping values on the template points that will generate unique geometry to be copied.
So for example, test 1000:0.25 means that there are 1000 template points with approximately 1000*0.25=250 unique stamping values to be used to uniquely deform the geometry (in these tests - as an offset for the Mountain node)
(1000:0 means there's one single stamping value for all 1000 template points)

Methods description:

You will see testing subnets in the performance monitor screenshots, though you can just download the file and see for yourself what methods are used, it's better to present here their quick description to help the reader better understand the results

copystamp: 

simple old way of copy-stamping, with stamp expression in mountain node

foreachcopy: 

foreach loop by number of incoming template points, merging together deformed with mountain sop and transformed with transform node copying geometry. transformation and id attributes are picked with point and detail hscript functions, so no extra geometry processing is done that should save time and memory

foreachcopy_copytopoints:

this is method similar to the previous, just instead of transforming with transform node with hscript references in parameters - it transforms each piece with copy-to-points node:
as you can see, a blast node is required here to isolate a single point of current iteration

foreachcopy_anotherway

This method is also based on foreach loop, but instead of doing a "by number" loop it does loop "for each point" on the incoming pointcloud. Deformations and transformations are done the same way as in foreachcopy method.
now mountain and transform node do not pick a point from incoming cloud by iteration from metadata, instead they pick the single point from the foreach_begin that brings each point one by one for each iteration

foreachcopy_byid

This is 2-looped method: outside loop goes over id attribute on template points, and inside loop is the same as in foreachcopy method, just optimized to pick id from outer loop and therefore calculate stamping geometry only once for each value of id attribute on template points.

each method has it's own compiled version that is marked with _compiled postfix.


Unaccounted factors

houdini caches and optimizes a lot of stuff by itself, these tests were run sequentially so thought all geometry was always dirtied and had to be recalculated, some nodes might have taken advantage of consistent template geometry topology over tests inside the same template point count group. Though such effects were not spotted at random order test launches, they were not studied properly and just considered as not significant.

Testing machine

asus ROG-GL553VW
cpu: i7-6700HQ (Skylake-H), 2.6 Ghz, no tweaks
mem: 32 GB DDR4-2133 (memory limit was not reached in tests, so no swap)
system: windows 10
houdini: 17.0.352

Benchmarks

Note, that copystamp was benchmarked separately, so it's red bar does not match the others' scale (though copy stamp is still the slowest in every one test Except :0 where due to caching it outruns the non-compiled versions of foreach methods, and on big amount of template points - even compiled methods)


100 template points

100:0  (100 template points, 1 unique stamp value)
run 1:
run 2:

100:0.25  (100 template points, ~25 unique stamp value)
run 1:
run 2:

100:0.5  (100 template points, ~50 unique stamp value)
run 1:
run 2:

100:0.75  (100 template points, ~75 unique stamp value)
run 1:
run 2:

100:1  (100 template points, ~100 unique stamp value)
run 1:
run 2:

1000 template points

1000:0  (1000 template points, 1 unique stamp value)
run 1:
run 2:

1000:0.25  (1000 template points, ~250 unique stamp value)
run 1:
run 2:

1000:0.5  (1000 template points, ~500 unique stamp value)
run 1:
run 2:

1000:0.75  (1000 template points, ~750 unique stamp value)
run 1:
run 2:

1000:1  (1000 template points, ~1000 unique stamp value)
run 1:
run 2:

10000 template points

10000:0  (10000 template points, 1 unique stamp value)
run 1:
run 2:

10000:0.25  (10000 template points, ~2500 unique stamp value)
run 1:
run 2:

10000:0.5  (10000 template points, ~5000 unique stamp value)
run 1:
run 2:

10000:0.75  (10000 template points, ~7500 unique stamp value)
run 1:
run 2:

10000:1  (10000 template points, ~10000 unique stamp value)
run 1:
run 2:

100000 template points

100000:0  (100000 template points, 1 unique stamp value)
run 1:
run 2:

100000:0.0625  (100000 template points, ~6250 unique stamp value)
run 1:
run 2:

100000:0.125  (100000 template points, ~12500 unique stamp value)
run 1:
run 2:

100000:0.25  (100000 template points, ~25000 unique stamp value)
run 1:
run 2:

100000:0.5  (100000 template points, ~50000 unique stamp value)
run 1:
run 2:

100000:0.75  (100000 template points, ~75000 unique stamp value)
run 1:
run 2:

100000:1  (100000 template points, ~100000 unique stamp value)
run 1:
run 2:

Result analysis

Straight away you can see how impossibly slow old copy-stamp way is compared to any of the ones done with foreach methods, in fact you may notice it fell out of the testing after 10000 points as it was taking more than 30 min to cook.
Second straight away thing you may notice is that all compiled ways are more than 6 times faster than their non compiled analogues. mostly this is due to multithreading, so gain of 4 to 8 times was expected.

Other than that you can see that all compiled methods are running very close to each other.
the foreach_copytopoints_compiled always runs about 25% slower than other closest competitor
and except for the :0 tests, where there's only single id value on template points, and all methods take advantage of the cached mountain node geometry calculated once, the foreachcopy_byid_compiled always lead by average of around 10%, reaching the peak at :0.0625 (and possibly lower, but higher than :0) where it can run up to 25% faster taking full advantage of working on small ammount of sets of large amount of template points having the same id, therefore calculating stamped deformation only once for each set.

For non-compiled, generally same as with compilable, all methods show the same order of performance.
foreachcopy_copytopoints is always a bit slower than others
methods foreachcopy and foreachcopy_anotherway can be considered pretty much indistinguishable, while foreachcopy_byid's performance varies depending on the amount of unique stamps more than in compiled version. it goes from being around 5% slower on mostly unique stamps to being, same as compiled version, up to 25% faster on low amount of unique stamp values


Conculsion

Copy Stamp: Judging from these results, old copy method may still be reasonable ONLY in cases of amount of stamps very close to 1 (basically - no stamping at all), it even seem to be able to outrun compiled methods on large amount of template points if no stamping is happening. (that may be due to compiled block being still slower than a c++ copy node. That was not tested though)
So copy-stamp is still good, but only when there is no stamp involved.

Compilable deformations: all compiled methods has shown very similar performance, with max 25% variation, so in general it doesn't matter that much which way you use. 
Using copy-to-points node in your loop is shown to always give a performance drop.
Generaly foreachcopy_byid method is shown to be the fastest on any nonsingle amount of unique stamp values, probably peaking somewhere between :0 and :0.25


Uncompilable deformations: It's not always possible to have all deformations compilable, so in these cases a non-compiled foreach approach has to be used.
Generally, same as with compilable, all methods show the same order of performance.
If you have a lot of unique stamps, like, for example, copying random timeshifts for simple crowd sim - you might want to go with foreachcopy or foreachcopy_anotherway methods
But if you know you have just a few unique stamp values compared to total number of template points - you should go with foreachcopy_byid method to gain just a couple of percent extra performance

This test is pretty rough and was run on a pretty simple task, so it's results should not be considered as ground truth for any general setup or case.

Source file: