amd x2

iceman7311 · 4 Jun 2005

are the new amd x2's going to be just for worksatations and servers or are the for home computers?

NetRyder · 4 Jun 2005

The X2's are desktop processors. They use the current Socket 939 interface, so you should be able to pop them into a current 939 A64 board (with a BIOS update, presumably).

According to AMD, the A64 FX is still the best choice for gaming. The X2 is supposed to offer better performance in productivity and digital media apps.

For servers and workstations, the Opterons would be more appropriate.

Sazar · 4 Jun 2005

Both workstations and personal use.

Wrt gaming (and other similar products), most games are still single threaded so a dual core processor will not make a difference. Only games programmed with multi-threading in mind will see boosts in performance.

Don't expect a mind-blowing performance difference immediately because single threaded apps are the majority.

LordOfLA · 4 Jun 2005

X2 will mostly benefit people with a large number of apps running at any one time (me for instance), but then this was true with dual cpu setups before they decided to put 2 chips in one core.

you will get a little extra out of a multi processor opteron system but it will cost you more due to the need of ecc ram and the 1 mb cache in the opteron.

However you wil have the advantag of being able to obliterate the nearest G5

Son Goku · 4 Jun 2005

Sazar said:
Wrt gaming (and other similar products), most games are still single threaded so a dual core processor will not make a difference. Only games programmed with multi-threading in mind will see boosts in performance.

Although essentially correct, there can be a slight (albeit negligable) difference due to the operating system's support for multiple processors, and if device drivers can make some use of this...

Anyhow, a tid bit some people might find interesting, concerning observations John Carmack made while working on multi-threaded support for the game, himself... Though the games we're talking about are old, his observations might be worth noting...

http://doom-ed.com/john-carmack/dual-processor-acceleration-for-quakearena.html

I recently set out to start implementing the dual-processor acceleration
for QA, which I have been planning for a while. The idea is to have one
processor doing all the game processing, database traversal, and lighting,
while the other processor does absolutely nothing but issue OpenGL calls.

This effectively treats the second processor as a dedicated geometry
accelerator for the 3D card. This can only improve performance if the
card isn't the bottleneck, but voodoo2 and TNT cards aren't hitting their
limits at 640*480 on even very fast processors right now.

For single player games where there is a lot of cpu time spent running the
server, there could conceivably be up to an 80% speed improvement, but for
network games and timedemos a more realistic goal is a 40% or so speed
increase. I will be very satisfied if I can makes a dual pentium-pro 200
system perform like a pII-300.

I started on the specialized code in the renderer, but it struck me that
it might be possible to implement SMP acceleration with a generic OpenGL
driver, which would allow Quake2 / sin / halflife to take advantage of it
well before QuakeArena ships.

It took a day of hacking to get the basic framework set up: an smpgl.dll
that spawns another thread that loads the original oepngl32.dll or
3dfxgl.dll, and watches a work que for all the functions to call.

I get it basically working, then start doing some timings. Its 20%
slower than the single processor version.

I go in and optimize all the queing and working functions, tune the
communications facilities, check for SMP cache collisions, etc.

After a day of optimizing, I finally squeak out some performance gains on
my tests, but they aren't very impressive: 3% to 15% on one test scene,
but still slower on the another one.

This was fairly depressing. I had always been able to get pretty much
linear speedups out of the multithreaded utilities I wrote, even up to
sixteen processors. The difference is that the utilities just split up
the work ahead of time, then don't talk to each other until they are done,
while here the two threads work in a high bandwidth producer / consumer
relationship.

I finally got around to timing the actual communication overhead, and I was
appalled: it was taking 12 msec to fill the que, and 17 msec to read it out
on a single frame, even with nothing else going on. I'm surprised things
got faster at all with that much overhead.

Some of Carmack's observations (in terms of what could be done) in experimenting, follow:

What is really needed for this type of interface is a streaming read cache
protocol that performs similarly to the write combining: three dedicated
cachelines that let you read or write from a range without evicting other
things from the cache, and automatically prefetching the next cacheline as
you read.

Intel's write combining modes work great, but they can't be set directly
from user mode. All drivers that fill DMA buffers (like OpenGL ICDs...)
should definately be using them, though.

Prefetch instructions can help with the stalls, but they still don't prevent
all the wasted cache evictions.

It might be possible to avoid main memory alltogether by arranging things
so that the sending processor ping-pongs between buffers that fit in L2,
but I'm not sure if a cache coherent read on PIIs just goes from one L2
to the other, or if it becomes a forced memory transaction (or worse, two
memory transactions). It would also limit the maximum amount of overlap
in some situations. You would also get cache invalidation bus traffic.

I could probably trim 30% of my data by going to a byte level encoding of
all the function calls, instead of the explicit function pointer / parameter
count / all-parms-are-32-bits that I have now, but half of the data is just
raw vertex data, which isn't going to shrink unless I did evil things like
quantize floats to shorts.

Too much effort for what looks like a reletively minor speedup. I'm giving
up on this aproach, and going back to explicit threading in the renderer so
I can make most of the communicated data implicit.

amd x2

iceman7311

OSNN Senior Addict

NetRyder

Tech Junkie

Sazar

Rest In Peace

LordOfLA

Godlike!

Son Goku

No lover of dogma

Members online

Latest forum posts

Affiliates

Latest profile posts

Forum statistics