Open
Conversation
|
I am happy with where this is, I think the only thing I could pessimistically think of is a test (with fixed seed) that equates the result with a known quantity. |
1450688 to
18b22ea
Compare
Author
|
I moved the consistency test to the start of the commit chain and remade it after disabling Then I turned |
4dd81ff to
4184349
Compare
Turning off -ffast-math is because otherwise any little compile flags change will change the output. Also remove unwanted library_dirs and include_dirs.
Fix the Makefile dependencies so things get rebuilt when they should be.
Add a `test` target.
Only produce one .so file, which knows how to run both 2d and 3d.
Break out main and the I/O methods into a separate cpp file, which is
not used for the python extension module.
Do not create pointless objects. The methods are all static. TSNE is
now a namespace, not a class, and the exported `run` method is wrapped
in `extern "C"` to make linkage easier.
Minor cleanup of includes / using statements.
Because I just couldn't help myself, remove a couple places where
new/delete were being used for no good reason.
Get rid of a few places where NDIMS (the build time constant) and a
run-time variable, which are supposed to have the same value, were both
being used in a method. This means flowing the template a bit further
up the tree. There's lots more places this could be done - this only
addresses the cases where not doing so could lead to errors if the
template variable was inconsistent with the run-time variable.
Kill off the last pointless vestiges of the cblas dependency.
Use hidden visibility where appropriate - only export the run() method.
This cuts the size of the .so and allows the compiler to inline more
aggressively, especially since we've also turned on LTO.
Together, all of these changes result in a build with a single 901k
extension module, instead of two 1.9MB extension modules.
Set -std=c++14 in the build. This is required by some of the cleanups
because of the use of the `nullptr` keyword. Older gcc will not default
to c++11 or higher, so will fail to build nullptr.
And c++17 doesn't work because
```
include/python2.7/stringobject.h:175:5: error:
ISO C++17 does not allow 'register' storage class specifier [-Wregister]
register Py_ssize_t *len /* pointer to length variable or NULL
^~~~~~~~~
```
Update travis build to Xenial container. This updates gcc to 5.4, which has
the required c++ standards support.
Also flow down templating to a few more places, and use arrays instead of vectors where possible.
Allocate an array of children, rather than having an array of child pointers. Cleanup useless methods, constness.
This decreases the in-memory size of the tree, which significantly increases the speed since it's easier for the CPU to fit it in cache. There's more that can be done with this - in particular, the boundary. But that's more complicated.
Rather than downloading an entire miniconda distribution, just `pip install` the requisite packages. Should speed up the build significantly. Also, use clang compiler only, because otherwise we fail the numerical stability test (which is ok, generally, but annoying). Use lld to link, because the llvm LTO plugin for ld.gold doesn't work on travis. Turn of stability test on CI. Minor compiler optimization differences can destroy repeatability. Getting it to not do that is too complicated.
4184349 to
8c86710
Compare
70d6d93 to
6f1065d
Compare
1. Don't copy the data. Just keep a reference. 2. Don't store the dimension in every datapoint. That's ridiculously inefficient. 3. Don't store the index in every datapoint. Just compute it as needed from pointer arithmetic.
This allows for more sse optimizations in a few cases.
Makes the datapoints smaller and allows vectorization in initialization to work better.
In addition to being more clear, this allows the compiler to recognize that pos_f is not aliased anywhere else so it can vectorize the calculation better.
Not only was that an important slowdown, but it also impacted repeatability across libm implementations.
Author
|
On a somewhat larger dataset, which took around a minute on marsoc, running on hydra we see which is roughly consistent with the improvement I expected to see. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix the Makefile dependencies so things get rebuilt when they should be.
Only produce one .so file, which knows how to run both 2d and 3d.
Break out main and the I/O methods into a separate cpp file, which is
not used for the python extension module.
Do not create pointless objects. The methods are all static. TSNE is
now a namespace, not a class, and the exported
runmethod is wrappedin
extern "C"to make linkage easier.Minor cleanup of includes / using statements.
Because I just couldn't help myself, remove a couple places where
new/delete were being used for no good reason.
Get rid of a few places where NDIMS (the build time constant) and a
run-time variable, which are supposed to have the same value, were both
being used in a method. This means flowing the template a bit further
up the tree. There's lots more places this could be done - this only
addresses the cases where not doing so could lead to errors if the
template variable was inconsistent with the run-time variable.
Kill off the last pointless vestiges of the cblas dependency.
Use hidden visibility where appropriate - only export the run() method.
This cuts the size of the .so and allows the compiler to inline more
aggressively, especially since we've also turned on LTO.
Together, all of these changes result in a build with a single 901k
extension module, instead of two 1.9MB extension modules.