If this is your first visit, be sure to check out the
FAQ by clicking the link above. You may have to
register or
login before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
| 24 JAN 2003 at 6:21am |
Agustín CordesGuild Master


Posts : 5696 Joined: 23 OCT 2002 Location: AR, Buenos Aires
Status : Offline | Somehow I think the second one but I can't see why it makes a difference. Perhaps because of the way stuff is stored in RAM and fast cache? Very interesting - I really want to know the answer to this one!
|
| 24 JAN 2003 at 6:38am |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By Rael (24 JAN 2003 6:21am) Perhaps because of the way stuff is stored in RAM and fast cache? You're on the right track there...
For reference, here's C code for the first variant (except for 5000x5000 array and I'm afraid the indentation got screwed up):
#define ROWS 5000 #define COLS 5000
void main (void) { int  *a)[ROWS]; int i, j;
a = (int*)malloc(COLS * ROWS * sizeof(int)); for (i = 0; i < ROWS; i++) for (j = 0; j < COLS; j++) a[j] = 0; }
|
In the second variant the loops look like this:
for (i = 0; i < COLS; i++) for (j = 0; j < ROWS; j++) a[j] = 0;
|
For bonus points, report the results you get on your PC.
I forgot my sig.
|
| 24 JAN 2003 at 4:48pm |
Agustín CordesGuild Master


Posts : 5696 Joined: 23 OCT 2002 Location: AR, Buenos Aires
Status : Offline | Originally Posted By MichalN (24 JAN 2003 6:38am) You're on the right track there... OK, let's see. Even though the array is like a matrix, those values are stored in RAM as a stream.
Array:
1:1000 integers 2:1000 . . 1000:1000
RAM:
1:1000 2:1000 ... 1000:1000
In the first case, the pointer starts setting the first entire row, then jumps to the 2nd. row and repeats.
However, in the 2nd. case, the pointer starts at the first element of the first row, then "jumps" to the 2nd. element of the 2nd. row until it reaches the n-element of the n row. Then it continues at the 2nd. element of the first row and repeats. I guess those "jumps" between several memory addresses can hit the performace.
So I was wrong The first case is the most optimal.
For bonus points, report the results you get on your PC. OK, maybe I will - I'm feeling too lazy today Anyway, I don't think if I'll see any difference. In both cases the process is very fast. I'll need a very accurate timer to get any good results.
|
| 24 JAN 2003 at 6:22pm |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By Rael (24 JAN 2003 4:48pm) So I was wrong The first case is the most optimal. Correct. Now do you know why exactly this is? It has everything to do with the memory/cache subsystem...
Anyway, I don't think if I'll see any difference. In both cases the process is very fast. I'll need a very accurate timer to get any good results. Just make the array bigger. You will see the difference with naked eye, trust me on that.
I forgot my sig.
|
| 25 JAN 2003 at 2:21am |
Agustín CordesGuild Master


Posts : 5696 Joined: 23 OCT 2002 Location: AR, Buenos Aires
Status : Offline | Originally Posted By MichalN (24 JAN 2003 6:22pm)
Correct. Now do you know why exactly this is? It has everything to do with the memory/cache subsystem... Honestly, no. Yes, I'm sure it have something to do with cache. Hold it - suddenly the words "sequential access" came to mind. Hmmmm...
|
| 25 JAN 2003 at 2:35am |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | sequential access But you're writing not reading...
Edit Follows:
Hmmm -  amn interesting question... I could be way off here (as I don’t do a lot of PC programming...) But, I'd guess it has something to do with CAS & RAS...
Again just a WAG as I don't work intimately with PC's.
|
| 25 JAN 2003 at 2:47am |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 2:35am) Hmmm -  amn interesting question... I could be way off here (as I don�t do a lot of PC programming...) But, I'd guess it has something to do with CAS & RAS... Not really... Rael is on the right track. It has to do with how the memory and cache subsystems are laid out.
The same effect is likely to occur on workstation class hardware too (PPC, HP-PA, Alpha AXP, SPARC, Itanic etc.). If you guys give up, I'll explain
I forgot my sig.
|
| 25 JAN 2003 at 2:52am |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | If a sequential "write" process is more or less efficient - it has to do with "how" memory is written to. (The CAS RAS thing - but, maybe my terminology is a bit off.)
|
| 25 JAN 2003 at 3:55am |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 2:52am) If a sequential "write" process is more or less efficient - it has to do with "how" memory is written to. (The CAS RAS thing - but, maybe my terminology is a bit off.) CAS/RAS is Column/Row Address Strobe which is the electrical signal...
But how about "cache line"? Or "memory interface width"? Or, even better, "cache thrashing"?
Also note that the effect will be pretty much the same when reading from the array. Same reason.
I forgot my sig.
|
| 25 JAN 2003 at 3:59am |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | But how about "cache line"? Or "memory interface width"? Or, even better, "cache thrashing"?
Ohhh - I'm getting a... well, you know what I'm getting...
|
| 25 JAN 2003 at 4:01am |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 3:59am)
Ohhh - I'm getting a... well, you know what I'm getting... Headache? Idea? ********? Something else? No, I don't know
I forgot my sig.
|
| 25 JAN 2003 at 4:10am |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | We're keeping the rating to a "G" for general here... (but one of your answers was the right one...)
|
| 25 JAN 2003 at 4:22am |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 4:10am) We're keeping the rating to a "G" for general here... (but one of your answers was the right one...) Okay, I hope it wasn't headache then
I must admit that I'm not closely acquainted with the US ratings system; when I arrived here I wasn't in an age to care much about it, other than out of pure curiosity.
So what was the idea?
I forgot my sig.
|
| 25 JAN 2003 at 4:41am |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | Good Save MichalN... ********
The Idea was how the bits were "electronically" populated (and, had more to do with the latency involved - which would be more hardware specific that your program).
Off base I think - But, that's just me - sometimes my mind just wanders...
|
| 25 JAN 2003 at 6:09pm |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 4:41am) The Idea was how the bits were "electronically" populated (and, had more to do with the latency involved - which would be more hardware specific that your program). No, it has nothing to do with the actual bits... OK, I'll explain.
Problem #1 in today's computers is that RAM is dead slow compared to CPUs. Even DDR SDRAM is absolutely no match for CPU speeds in the GHz range. The bandwidth isn't great and what's worse, the latency is terrible.
To work around the problem, two main methods are employed:
1. Wider memory interfaces. A 486 accessed memory 32 bits at a time, today's computers may do 256 bits at once. This improves the bandwidth but does nothing to fix latency.
2. Caching. Typically L1 and L2, sometimes L3 caches are implemented, with increasing size and decreasing speed. Caches are fast but by definition small.
Now the trick is that the memory subsystem usually delivers highest performance when the access pattern is sequential. An important thing to remember about caches is that they usually are organized into units called cache lines, typically 32 bytes. A cache always reads/writes an entire line.
So when reading from RAM, you always have to read the entire cache line. If you access memory sequentially, you will use all that data immediately. If you are "jumping around" in the memory, many of the reads will be wasted.
In the case of writing what happens is this: after a short while the caches will be populated with "dirty" lines. Each new write will need to fetch another, not yet cached line and write back a line to make space for the new one. So each write in the program will cause physical memory access, but a whole cache line will be read/written even though you modified perhaps only one byte.
If the access is sequential, the cache utilization is optimal and maximum performance is achieved because no reads or writes are wasted and several CPU R/W operations will be physically merged.
On my PIII machine the sequential algorithm runs about 3x faster than the other variant. That's a huge difference! On an old AT there would be no difference at all.
Did this make sense?
I forgot my sig.
|
| 25 JAN 2003 at 6:28pm |
InlandAZGuild Master


Posts : 5586 Joined: 4 MAY 2007
Status : Offline | It's makes absolute sense - As soon as you mentioned thrashing, I went: Doooooough...
|
| 25 JAN 2003 at 7:32pm |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By InlandAZ (25 JAN 2003 6:27pm) It's makes absolute sense - As soon as you mentioned thrashing, I went: Doooooough... If you are familiar with any caching or paging system (I'm sure you are), the same effect can occur there.
I forgot my sig.
|
| 25 JAN 2003 at 10:25pm |
Agustín CordesGuild Master


Posts : 5696 Joined: 23 OCT 2002 Location: AR, Buenos Aires
Status : Offline | Brilliant! I will try this at work this monday and get back with the results
|
| 2 FEB 2003 at 8:22pm |
MichalNGrand Inquisitor


Posts : 7058 Joined: 14 SEP 2003
Status : Online | Originally Posted By Rael (25 JAN 2003 10:24pm) Brilliant! I will try this at work this monday and get back with the results So, Rael - what were the results?
I forgot my sig.
|