Nasher wrote:
PolarBear wrote:
I concur with OneDay on this - it's likely that the memory bandwidth of your machine is just not enough given the number of threads running and that each thread constantly accesses different areas of memory.
Like I mentioned above, it gets faster when the data arrays have smaller bounds. Given what OneDay said about the caching, this makes me think that it's tossing around the entirety of those arrays when there is work being done on just a few elements therein. I'm hoping marshaling the specific elements I need to work with will bypass that tossing.
edit: I thought my previous post didn't post, so I've written some of the same things again.
An array is always stored in one continuous section of memory, so when accessing an element of an array in memory, the array itself is not read into the cache. .NET has a pointer reference to the start of the array, it has an index, and it knows the size of the elements in the array. It can therefore jump straight to the section of memory it reads. It will not read the whole array, and as most CPU caches are < 10MB, it can't read anything near the whole array into the cache if the array is large.
How the hardware decides what to pull into CPU cache, other than what was requested, I'm not totally sure. As far as I'm aware it'll grab a page, depending on hardware this may vary in size, but let's say it grabs a 4K page. It doesn't look at what that page contains, or whether it's in any way related, it' just one 4K block of memory. The logic being, that if you're looking at something in that area of memory, then you'll probably want something else in that area of memory as well. This is especially true when working with objects, as all the object value types will also be stored in a single block of memory. It also means we can have multiple sections of memory under a single lookup in the CPU cache, which means the cache lookup table doesn't need to be as big.
This brings me onto my next point. An array of objects is an array of object references. That means that the objects themselves have no locality to the array. You'll just jumping all over the place with an array of objects. You need to use an array of structs, then all the structs will be stored adjacently in a single block of memory.
As said, assuming you're working with some kind of tree that you iterate though, you want to have adjacent nodes as close to each other as possible in the array. Due the the exponentially increasing size of a tree with levels, they are horrible structures for cache locality, but there's not much you can do about that.
So to sum it up, don't bother with marshalling, just use an array of structs.