RealPlayer 11 Enables Easy Video DownloadsRealNetworks announced a new version of its RealPlayer media player software

RealNetworks announced a new version of its RealPlayer media player software. Its makers claim RealPlayer 11 will be the first media player to offer one-click video downloads of embedded web-based content. The online player will also support rival formats Flash and QuickTime.

The concept capitalizes on the popularity of online video sites such as YouTube. Rob Glaser, chairman and CEO of Real, said, “The new player gives consumers more control of internet video. By floating a ‘Download this video’ button next to video seen on thousands of websites, RealPlayer makes it one-click simple.”

Until now, online video content has primarily been limited to streaming. Real isn’t the first company to enable video downloads, but it hopes the straightforward ‘Download this’ approach will quickly catch on. Customers of Apple’s iTunes can already download video clips and video podcasts for free.

RealPlayer 11 enables users to watch videos, burn them to DVD and CD and share them easily. As well as downloading online video, users will be able to send links to friends.

Available later this month, a beta version of RealPlayer 11 will be downloadable for free from

Future versions of RealPlayer, currently in development, will enable users to transfer downloaded video content to a personal media player. At present, grabbing online video content for playback on a portable device requires the portable device to have specific video codecs or third-party transcoding software.


Processor power consumption has been spiraling through the roof, both team blue (Intel) and team green (AMD ) have made attempts to address this issue. The first option was to throttle back the speed of the processor when running with low utilization. This makes a lot of sense when you think about it, why do you need a CPU running at full speed and sucking up a lot of energy when the machine sitting idle.

Moving to a smaller CPU die manufacturing process usually helps but if you’re trying to save an architecture that has a fatal flaw (like the Pentium 4), this is more of a band aid move. Intel eventually scrapped the Netburst architecture in favor of the Banias (Pentium M) later revised to be the Core 2 Duo. AMD on the other hand, has stuck with its K8 architecture because it doesn’t suffer from the same degree of energy and heat problems.

AMD hasn’t moved as quickly to new manufacturing processes sticking with 90 nanometer for quite a while. Today the money is with 65nm SOI processor manufacturing, and with the AMD Athlon64 X2 4800+ the company has finally done the die shrink.

At the moment Intel might have the performance lead with the Core 2 Duo, however AMD’s overall processor lineup is still more attractive to mainstream users, because well… there aren’t many cheap Intel Core 2 Duo yet. What’s nice about AMD’s dual core K8 CPU lineup is that you can find a dual core processor quite easily, no matter what your budget is. Starting from the mainstream Athlon64 X2 3600+ to the high end X2 6000+, there is a dual core Athlon64 X2 processor for you no matter how much you can spend.

The Socket AM2 AMD Athlon64 X2 4800+ processor offers us a decent mix of performance and value, with a retail price of $253 CDN ($215 US, ?112GBP) it certainly won’t break the bank either. The Athlon64 X2 4800+ is built on AMD’s 65 nanometer manufacturing process, and this means it will help to address the power and heat issue that PCSTATS has been commenting on for the last six months.

AMD Athlon64 X2 4800

AMD Athlon64 X2 4800

The Socket AM2 AMD Athlon64 X2 4800+ processor comes clocked at a cool 2.5 GHz, with a 12.5x CPU clock multiplier (12.5 x 200 MHz = 2500 MHz). Each core in this dual core Athlon64 X2 4800+ CPU has a 128KB L1 cache along with a 512KB L2 cache. This is a bit different from what AMD did with its Socket 939 series Athlon64 X2 4800+, it ran at 2.4 GHz and had 1MB L2 cache per core.

AMD’s Athlon64 X2 4800+ processor is compatible with all Socket AM2 motherboards on the market although a BIOS flash might be necessary for the board to recognize the processor properly. The bus and HyperTransport is the same but some newer model numbers are not always recognized by older motherboards. The integrated memory controller supports up to 8GB of DDR2-800 memory running in a dual channel configuration.

Like all available AMD processors based on the K8 core, the Socket AM2 Athlon64 X2 4800+ is x86-64 compatible. It can natively run both 32 bit and 64 bit applications like Microsoft WindowsXP x64 Edition, or the 32 and 64 bit versions of Microsoft Windows Vista. The 940-pin Athlon64 X2 4800+ that PCSTATS will be testing is based on the “AAAFG” stepping (K8 ‘Brisbane’ core) and was built in week 44 of 2006.


Has there ever been a PC enthusiast product with as much anticipation as the AMD R600 GPU core?  Probably, but right now it’s hard to think of one.  The R600 core (now officially named the AMD Radeon HD 2000 series) has been discussed even while the R580 (ATI’s Radeon X1900 XTX) was in its infancy and that was in January of 2006.  As you might imagine then in an industry where 6-month product cycles became the norm, ATI’s stale product line was beginning to cause some concerns and raise some eyebrows.  Had NVIDIA finally pushed them out of the graphics market for good with the GeForce 7-series and new 8-series of cards? 

Of course we knew better, and AMD (who purchased ATI last year) tells us they are in the graphics market for the long haul, in enthusiast products and the upcoming Fusion products that combine the work of the GPU and CPU technologies.  For today though, we are given the R600, a graphics architecture that was built well before AMD’s acquisition of ATI and falls under the same GPU industry values – performance for gamers.

Architecture Overview

The R600 architecture is a major step beyond the R580 architecture we saw in ATI’s X1900 XT graphics cards and lineup that jumps down the road of unified shaders, a route required by Microsoft’s Direct X 10 and first implemented in the G80 architecture from NVIDIA.  AMD claims that the new HD 2000 series of graphics cards will take the best features of the X1000 series (like dynamic branching and stream computing) and combine them with advantages from the Xbox 360’s Xenos GPU (such as unified shaders and stream out) and add new technology (like DX10 support, a superscalar shader processor and updated dispatch processor) to make a truly amazing product. 

At least that’s what they claim.

ATI Radeon HD 2900 XT

Here is the very big, very complex diagram of the new architecture behind the R600, specifically the flagship HD 2900 XT card that we are reviewing here today.  I’ll attempt to break this down piece by piece with more detailed descriptions and explanations of what the various components we are seeing here actually do.

The ATI Radeon HD 2900 XT card (which again is the one shown above) includes 320 stream processors, 4 SIMD units, 4 texture units and 4 render back ends (ROPs).  The modular design of the core allows the lower models like the HD 2600 and HD 2400 to have a subset of those units and run with the same features.  We’ll detail the product lineup towards the end of the detailed discussion.

The very first unit that data must pass through when entering the GPU is the command processor that is responsible for interacting with the stream of data from the graphics driver.  AMD is claiming that the upgraded unit can offload as much as 30% of the CPU overhead in batch improvements from the driver thus allowing for better overall performance.

The setup engine is the unit responsible for preparing the data and organizing it for processing by the various SPUs (stream processing units).  There are three functions that is mainly will do, one for each type of processing work the SPUs might do: vertex assembly and tessellation, geometry assembly and scan conversion for pixel shaders.  Each of these can submit threads to the dispatch processor sitting below it. 

ATI Radeon HD 2900 XT

The ultra-threaded dispatch processor is one of most complex parts of the new R600 architecture and has a lot of critical logic built into that the performance is dependent on.  The main function here is to order and send to processing the various “threads” that have been created that include a list of instructions to be executed on some data (be in textures, pixels, etc). 

The 320 stream processors are divided into four 80-SPU blocks that each depend on a set of arbiters and sequencers responsible for selecting the thread to submit.  In connection with the SIMD arrays, the dual arbiters allow two operations at a time to be processed by the SPUs, thus indicating a superscalar architecture are work — mostly.  Each of these threads can be “bumped” and their states saved in order to allow more critical threads to pass through and resume later at the dispatchers whim.  In part to aid in latency hiding, the ability for a thread to be “bumped” keeps it from stalling the pipeline if it is waiting for memory access before it can continue processing. 

You will also notice on the right hand side there that dedicated caches exist for shader constants and instructions to allow for unlimited shader program length.  On the left hand side of the SPUs you’ll find the all important memory read/write cache that is what allows for interthread communication and is a requirement of the new DX10 architecture.  The stream out feature allows the shaders to bypass the ROPs and color buffer or to output sequential data instead of bitmaps; useful for that whole GPGPU thing.

The SIMD arrays of the R600 architecture are still a bit confusing to me.  AMD claims that they use VLIW (very long instruction word) design to include up to six operations (5 math and 1 flow control) and all six can be performed in parallel on each data element in that current thread.  Note though that the vertex and texture fetching is done separately and does not attempt to take advantage of these SIMD benefits.

ATI Radeon HD 2900 XT

Here is a detailed shot of a single stream processing unit.  You’ll notice that in the diagram above (the big one) that there are only 64 of these in there; so where does the 320 SPUs AMD claims come from?  Well, there are five SPUs inside this block and 5 x 64 = 320, so there you go.  Now, some people (namely NVIDIA) will claim that the comparison to the G80’s 128 SPUs is unfair because of the amount of functionality they produce.  It’s a very complicated debate that goes deep into the theories of CPU design so we’ll leave it at this: AMD claims 320 SPUs but performance results are what matters.

In any event, the arranged 5-way superscalar shader processor can issue up to five FP MAD (floating point Multiply-Add) instructions, IF the thread is able to fill that wide of a processing request.  Each SPU is fully 32-bit FP but still supports integer operations (of course).  The little branch unit on the right is there to handle flow control commands and just as with the R580 can eliminate flow control performance overhead.  Finally, the general purpose registers store the input/output data to share their processing results.


AMD is definitely proud of their SPUs and mentioned to us over and over how it could achieve 475 GigaFLOPS (floating point operations per second).  This table shows you how the 2900, 2600 and 2400 and a desktop processor line up in potential for raw processing power.  This is exactly why the GPU Folding@Home and stream computing initiatives are gathering so much steam.

ATI Radeon HD 2900 XT

The R600’s four texture units each have eight address processors that execute shader instructions to control their lookups, twenty texture samplers that can fetch a single data value per clock and four floating point filter units that handle 64-bit and 128-bit bilinear filters.  The HD 2900 and 2600 feature a two-level texture cache design (256kb for HD 2900 and 128kb for HD 2600) while the HD 2400 uses a single level cache. 

AMD’s texture units can do 64-bit HDR texture bilinear filtering at full speed and 128-bit FP textures at half speed.  Interestingly, users that preferred the “High Quality” texture filtering setting in AMD’s previous graphics generation will like to know that in the R600, that quality setting is now the default.  NVIDIA’s G80 architecture implemented that as well last year. 

ATI Radeon HD 2900 XT

The render back-ends as AMD is fond of calling them are known as ROPs to the NVIDIA crowd and are responsible for post-processing, anti-aliasing and depth tests among other things.  These can handle up to 32 pixels per clock in the stencil tests on the Radeon HD 2900 while the 2600 and 2400 models will only get 8 pixels per clock.  A neat new feature is that the MSAA resolve functions are programmable and allows for a custom filter AA — something that AMD was showing off later.  We’ll cover that on a later page as well.

The depth, stencil and compression functions on the render back-ends were improved significantly over the X1000 series:

ATI Radeon HD 2900 XT

This simple array of stencil shadow tests shows that the HD 2000 series is seeing a 25-95% gain depending on the application.

Overall I’d say that the R600 architecture is actually much more similar to NVIDIA G80 than I expected, but with the requirements of unified shaders and other features from DirectX 10, it’s likely that being forced to meet them pushed both sets of engineers in the same general direction.  AMD’s R600 does feature a superscalar (versus the scalar architecture on the G80) and a new custom filtered AA option though that could help it be a success.  We’ll have to see the benchmarks first before passing out any congratulatory cigars.

Updated Ring-bus Memory Controller

The new R600 architecture is also sporting a new version of the ring-bus memory controller used on the R580 design.  This time the interface is actually 512-bits, surpassing the 384-bit bus that the GeForce 8800 has.  The larger bus width allows AMD to get more bang out of existing memory technology by giving more of the GPU access to it at the same time.  Memory can also run slower and thus cooler and still maintain the same bandwidth values as previous generations, should they decide to go that route. 

ATI Radeon HD 2900 XT

In this version of the ring bus, the controller is full distributed instead of having a central arbiter with crossbars as we saw in the R580.  Because the design of these crossbars was not as a flexible as the ring bus design itself, they were removed in favor of a model that ONLY works on the “stop” of the memory ring. 

The memory controller on the HD 2900 XT flagship card is divided up into eight separate 64-bit memory channels for a total of 512-bits.  Each channel has access to portions of the frame buffer (64MB each in this case) and memory accesses are don by circling around the ring bus to each “stop” to find the proper location.  While it may sound inefficient at first, the bus moves quickly enough that AMD’s solution isn’t suffering any kind of drawbacks and in fact may result in the largest memory bandwidth numbers we’ve seen.