#include <iostream> #include <vector> #include <time.h> typedef std::vector<double> DoubleVector; int main() { int nhist(70000); int nbins(2000); std::vector<DoubleVector*> dataY(nhist); std::vector<DoubleVector*> dataE(nhist); std::vector<DoubleVector*> dataX(nhist); #pragma omp parallel for for(int i = 0; i < nhist; ++i) { dataY[i] = new DoubleVector(nbins); dataE[i] = new DoubleVector(nbins); dataX[i] = new DoubleVector(nbins); } clock_t start = clock(); for(int i = 0; i < nhist; ++i) { delete dataY[i]; delete dataE[i]; delete dataX[i]; } clock_t now = clock(); const float retval = float(now - start)/CLOCKS_PER_SEC; std::cout << "Time for deallocation=" << retval << " seconds." << std::endl; return 0; }
The attached code demonstrates an issue that we have observed within our application that has been compiled using Visual Studio 2012 with Update 3.
Allocating a large amount of memory while inside an OpenMP loop causes the deallocation time to increase quite severely. Running the above example code, in release mode, takes ~1s to deallocate all of the vectors on my system. If I comment out the '#pragma omp parallel for' line then the deallocation time drops to ~0.3s.
Furthermore, if I increase the value of the nbins parameter to ~2500 then the deallocation time is ~340s!
Can anyone offer an explanation for this significant difference?