
Dell UP3214Q Review
Last year I spent time with one of the first UltraHD monitors to be come out and came away convinced of the benefits. Even though the screen size was not much larger than my usual display, the extra clarity and detail was totally worth it. It sealed my decision to buy a MacBook Pro Retina when it was updated last fall as well. Now we’ve seen the field of UltraHD displays expand considerably and so we now look at another 32” UltraHD display, the Dell UP3214Q.
Read More ...
GIGABYTE GA-6PXSV3 Review
Server motherboards, unlike consumer motherboards, are never bought for looks. It is all about function, and the GIGABYTE GA-6PXSV3 we are reviewing today is aiming to supply enough at the lower end of the extreme workstation segment. Here we have an ATX motherboard akin to our usual socket 2011 platform but with server level features such as Xeon/RDIMM ECC support, an ASpeed AST2300 remote management controller and a focus on virtualized environments.
Read More ...
ARM Partners Ship 50 Billion Chips Since 1991 - Where Did They Go?
A few weeks ago ARM celebrated its partners shipping over 10 billion ARM based chips in 2013. As ARM makes a royalty on every IP license shipped, it was a banner year for the company. The bigger story was that the 10 billion in 2013 brought the cumulative total for ARM based processors to over 50 billion (note that these are discrete ICs, multiple cores within a single design are not counted multiple times). ARM's press activities were limited to talking about the big final number, but ARM has a pretty broad IP portfolio. What I wanted was a breakdown of where the 50 billion went, so I asked.
What I got in response were tables of data. I was asked not to share specific numbers, but using the data in graphs was ok - and that's all I wanted to do. We'll start with where the 50 billion went in terms of markets (pictured above). Mobile obviously took the majority of shipments, followed by embedded markets. Remember that ARM cores are used all over the place, including in things like HDD and SSD controllers. The modems that work alongside the main apps processors in mobile devices are also frequently home to ARM processor IP. Intel's Quark project actually came about because Intel needed a low power/low cost core to use internally for various projects and eventually decided to offer it to anyone who wanted it. For those companies that don't have the desire/ability to build and validate their own low power CPU core, they often turn to ARM.
The enterprise slice may be a bit misleading depending on what you define as enterprise. We often refer to enterprise in terms of primary CPU shipments into servers. In this case we're talking about chips that go into things like routers and wireless access points. ARM obviously hopes to take a big portion of the high dollar enterprise CPU market with its ARMv8 based IP in the coming years, but it's not there yet.
The smallest slice, labeled home, is still nearly 3 billion shipments. Here we're talking about things like consumer set top boxes as well as wearables.
Note that 37.5 of the 50 billion chips shipped in the past five years (2009 - 2013). That shouldn't come as a surprise given the overlap between that time period and the rise of modern smartphones and tablets.
While ARM wasn't willing to give me shipments by specific core, it was willing to give me family data:
Two thirds of all ARM mobile shipments are really old ARM7 and ARM9 based designs (remember my point about modems above). Here we get the first hint that the reign of the ARM11 designs (the foundation of the original iPhone) was a small blip in the grand scheme of things - the Cortex A family is really what allowed mobile to grow.
The embedded market is dominated by these lower power cores, although the newer Cortex M designs have made a huge dent. The same is true for the enterprise market, which is indicative of what I said earlier about ARM's enterprise market not yet including primary CPU sockets.
The trends are extremely telling. ARM7 (and ARM9) shipments peaked back in 2011 and have been in a slow decline ever since. Cortex M based designs have been skyrocketing since their introduction and show the most aggressive growth of any ARM line. The Cortex A line shows a similar slope over the past two years, with the ARM11 shipments crossing over in 2012.
The next two charts show the same data but focusing on the past 7 years and past 4 years, respectively:
You can easily correlate the rise in ARM's shipments with the explosion in mobile. It's also interesting to point out that, for the most part, shipments are growing with higher performing product families. A smart man once told me that no one wins by betting against performance. Although ARM definitely has its fair share of area and power optimized designs, ultimately it's the serious focus on performance that's been responsible for the surge over the past few years.
It's worth pointing out that although the shipment numbers we're talking about here are in the billions, there's a point to be made about margin. ARM pointed out that Cortex-A shipments overtook x86 in 2012, but with most Cortex-A based designs shipping at well below $30 it's important to put volume in context.
There's a real opportunity for ARM and its partners to start pushing for even higher end designs in my opinion. Thus far all of the talk about ARM enterprise CPUs has been focused on effectively repurposing smartphone designs for the datacenter. You could argue that the Cortex A57 is more enterprise focused than mobile focused, but the fact remains that it's still small/low power enough to get into a phone. I believe one of the next opportunities for disruption will be if ARM (and/or its partners) build a truly big core, something aimed exclusively at the enterprise (and could be repurposed for notebook/desktop use). I've got to believe that all the big players in the ARM space are working on such a thing. And the implications of even moderate success of such a thing are pretty big (particularly if you look at the impact to server CPU ASPs).
Read More ...
Apple's Cyclone Microarchitecture Detailed
The most challenging part of last year's iPhone 5s review was piecing together details about Apple's A7 without any internal Apple assistance. I had less than a week to turn the review around and limited access to tools (much less time to develop them on my own) to figure out what Apple had done to double CPU performance without scaling frequency. The end result was an (incorrect) assumption that Apple had simply evolved its first ARMv7 architecture (codename: Swift). Based on the limited information I had at the time I assumed Apple simply addressed some low hanging fruit (e.g. memory access latency) in building Cyclone, its first 64-bit ARMv8 core. By the time the iPad Air review rolled around, I had more knowledge of what was underneath the hood:
As
far as I can tell, peak issue width of Cyclone is 6 instructions.
That’s at least 2x the width of Swift and Krait, and at best more than
3x the width depending on instruction mix. Limitations on co-issuing FP
and integer math have also been lifted as you can run up to four integer
adds and two FP adds in parallel. You can also perform up to two loads
or stores per clock.
With Swift, I had the luxury of Apple
committing LLVM changes that not only gave me the code name but also
confirmed the size of the machine (3-wide OoO core, 2 ALUs, 1 load/store
unit). With Cyclone however, Apple held off on any public commits.
Figuring out the codename and its architecture required a lot of
digging.Last week, the same reader who pointed me at the Swift details let me know that Apple revealed Cyclone microarchitectural details in LLVM commits made a few days ago (thanks again R!). Although I empirically verified many of Cyclone's features in advance of the iPad Air review last year, today we have some more concrete information on what Apple's first 64-bit ARMv8 architecture looks like.
Note that everything below is based on Apple's LLVM commits (and confirmed by my own testing where possible).
Apple Custom CPU Core Comparison |
||||||
Apple A6 |
Apple A7 |
|||||
CPU Codename |
Swift |
Cyclone |
||||
ARM ISA |
ARMv7-A (32-bit) |
ARMv8-A (32/64-bit) |
||||
Issue Width |
3 micro-ops |
6 micro-ops |
||||
Reorder Buffer Size |
45 micro-ops |
192 micro-ops |
||||
Branch Mispredict Penalty |
14 cycles |
16 cycles (14 - 19) |
||||
Integer ALUs |
2 |
4 |
||||
Load/Store Units |
1 |
2 |
||||
Load Latency |
3 cycles |
4 cycles |
||||
Branch Units |
1 |
2 |
||||
Indirect Branch Units |
0 |
1 |
||||
FP/NEON ALUs |
? |
3 |
||||
L1 Cache |
32KB I$ + 32KB D$ |
64KB I$ + 64KB D$ |
||||
L2 Cache |
1MB |
1MB |
||||
L3 Cache |
- |
4MB |
||||
I also noted an increase in overall machine size in my initial tinkering with Cyclone. Apple's LLVM commits indicate a massive 192 entry reorder buffer (coincidentally the same size as Haswell's ROB). Mispredict penalty goes up slightly compared to Swift, but Apple does present a range of values (14 - 19 cycles). This also happens to be the same range as Sandy Bridge and later Intel Core architectures (including Haswell). Given how much larger Cyclone is, a doubling of L1 cache sizes makes a lot of sense.
On the execution side Cyclone doubles the number of integer ALUs, load/store units and branch units. Cyclone also adds a unit for indirect branches and at least one more FP pipe. Cyclone can sustain three FP operations in parallel (including 3 FP/NEON adds). The third FP/NEON pipe is used for div and sqrt operations, the machine can only execute two FP/NEON muls in parallel.
I also found references to buffer sizes for each unit, which I'm assuming are the number of micro-ops that feed each unit. I don't believe Cyclone has a unified scheduler ahead of all of its execution units and instead has statically partitioned buffers in front of each port. I've put all of this information into the crude diagram below:
Unfortunately I don't have enough data on Swift to really produce a decent comparison image. With six decoders and nine ports to execution units, Cyclone is big. As I mentioned before, it's bigger than anything else that goes in a phone. Apple didn't build a Krait/Silvermont competitor, it built something much closer to Intel's big cores. At the launch of the iPhone 5s, Apple referred to the A7 as being "desktop class" - it turns out that wasn't an exaggeration.
Cyclone is a bold move by Apple, but not one that is without its challenges. I still find that there are almost no applications on iOS that really take advantage of the CPU power underneath the hood. More than anything Apple needs first party software that really demonstrates what's possible. The challenge is that at full tilt a pair of Cyclone cores can consume quite a bit of power. So for now, Cyclone's performance is really used to exploit race to sleep and get the device into a low power state as quickly as possible. The other problem I see is that although Cyclone is incredibly forward looking, it launched in devices with only 1GB of RAM. It's very likely that you'll run into memory limits before you hit CPU performance limits if you plan on keeping your device for a long time.
It wasn't until I wrote this piece that Apple's codenames started to make sense. Swift was quick, but Cyclone really does stir everything up. The earlier than expected introduction of a consumer 64-bit ARMv8 SoC caught pretty much everyone off guard (e.g. Qualcomm's shift to vanilla ARM cores for more of its product stack).
The real question is where does Apple go from here? By now we know to expect an "A8" branded Apple SoC in the iPhone 6 and iPad Air successors later this year. There's little benefit in going substantially wider than Cyclone, but there's still a ton of room to improve performance. One obvious example would be through frequency scaling. Cyclone is clocked very conservatively (1.3GHz in the 5s/iPad mini with Retina Display and 1.4GHz in the iPad Air), assuming Apple moves to a 20nm process later this year it should be possible to get some performance by increasing clock speed scaling without a power penalty. I suspect Apple has more tricks up its sleeve than that however. Swift and Cyclone were two tocks in a row by Intel's definition, a third in 3 years would be unusual but not impossible (Intel sort of committed to doing the same with Saltwell/Silvermont/Airmont in 2012 - 2014).
Looking at Cyclone makes one thing very clear: the rest of the players in the ultra mobile CPU space didn't aim high enough. I wonder what happens next round.
Read More ...
NASA: Say Goodbye to Buzz Lightyear Suit, Hello to Z-2
New designs may lack the flare of the Z-1, but improve technically
Read More ...
Supposed iPhone 6 Leaked Sketches, Images Show Body w/Rounded Edges
If accurate, Android better hope Apple didn't patent the more widely rounded edge!
Read More ...
NHTSA Proposes Legislation Requiring Backup Cameras on All Light Vehicles
This spans cars, SUVs, trucks and vans
Read More ...
Primate Stem Cell Creation Appears Driven by Genes From Ancient Virus
Stealing genes from HERV-H may have offered a safer way to induce pluripotency in primate embryos
Read More ...
Quick Note: Phil Spencer Named New Head of Xbox
Spencer is stepping up after Chief Product Officer of Xbox Marc Whitten left earlier this month
Read More ...
Apple, Samsung Head to Court in Second U.S. Mobile Patent Trial, Google to Speak Up
This case has more to do with Android software than Samsung's hardware
Read More ...
25-Year Microsoft Executive Leaves the Company
He wants to "see what the non-Microsoft world has to offer"
Read More ...
Directory of "Illegal Websites" Aims to Cut Off Pirates' Ad Revenue
New effort looks to directly take away a top source of pirate booty
Read More ...
T-Mobile Axing Monthly Corporate Discounts
It will stay available for government and military, though
Read More ...
India, China Yet to Approve Parts of Nokia Sale Amid Strikes, Tax Disputes
Nokia and Microsoft are struggling to convince Chinese and Indian governments to buy into the purchase
Read More ...
Available Tags:Dell , GIGABYTE , iPhone 6 , iPhone , Xbox , Samsung , Google , Microsoft , Nokia ,








No comments:
Post a Comment