Sheep Guarding Llama
Scott Alan Miller :: A Life Online

CPUs, Cores and Threads: How Many Processors Do I Have?

March 7th, 2008 by Scott Alan Miller

It my job role I am very often called upon to determine how many “processors” a machine has or how many we will need for a specific task. Ten years ago this was a simple process but today even the very concept of the “processor” is fuzzy at best and only a very few people have a clear understanding of what it means. I spend much of my time explaining, as best as possible, the terms needed to even discuss processors today as everything processor related must be seen in context.

Before we begin let us look at the terms involved in discussing processors starting from the bottom of the stack. On the bottom we have a chip carrier, this can be something as simple as a motherboard (a.k.a. mainboard, systemboard, MoBo), a processor daughtercard or a dedicated chip carrier. Any of these will qualify for our use. A chip carrier holds sockets. Chip carriers can have a single socket or many.

On a standard desktop or laptop computer we would expect to find that we have a single motherboard containing a single socket. On a mid-sized server such as the HP DL360 G5 we see a single motherboard with two sockets. On a larger server such as the Sun SunFire x4600 we see several daughtercards each with a single socket but with a total of eight sockets available in the overall system.

The Intel Pentium III Slot A chips are a perfect example of a dedicated chip carrier. In the case of the Slot A Pentium III processors the processor itself was mounted directly onto a small daughterboard that was dedicated for the purpose of carrying the Pentium III processor and its associated voltage management electronics. This small card was enclosed in plastic for protection and would be attached directly to the motherboard.

Above the layer of the chip carrier is the socket. A socket is a physical connector allow a chip to be connected to a board. Occasionally a chip as important as the CPU will be connected to a board without a socket. This is more common when dealing with embedded systems and is exceedingly rare in general purpose computing. In this case the connection itself could be considered analogous to the socket. The use of the socket in this explanation can be confusing because of its questionable interpretation but it is important in its inclusion because of the need to identify potential system capacity and classification which is done, normally, at this level.

It is in the counting of sockets per computer that we determine that maximum “way” of a server. For example the DL360 mentioned above is classified as a two-way server. And the x4600 is an eight-way server. This is the case when the server is at capacity. A particular server would be classified by the number of sockets in use. For example a DL385 with just one socket occupied could be considered a one-way server but with extra potential capacity. By adding another chip to the second socket we are said to be upgrading from a one-way to a two-way. Many server vendors have started advertising the “way” of their servers based on non-socket based factors but this practice is non-standard and highly misleading. Be sure to compare servers based on socket capacity and not on advertised “way”.

Each socket is capable of holding one physical processor. While sockets are purchased with the board to which they are attached, a chip can be purchased already in a socket or as a standalone product. Processors are often sold in stores in boxes just like any other product and are the most visible form of “processor” that consumers will face. This is the only “processor” that can be seen visibly, held in the hand, bought as an item in a store, etc. This is the physical manifestation of processing power. Just as socket count determines the maximum “way” of a computer the processor count will determine the current “way” of that computer. Most consumers or desktop administrators think of processors in terms of the physical chip. If the term processor is to have an official usage this is the level at which it is most appropriate. Common examples of a processor include the AMD Opteron, Intel Core, Intel Pentium II, Sun UltraSpark IV or IBM Power6.

The most important industry recognition of this “level” being the “processor” is Microsoft, Oracle and most major software vendors using this definition of processor to determine their per-processor licensing requirements. Because of this stand on the definition of processor and its long history of use mostly in this context we are likely to see the word processor remain linked to the physical entity.

Each processor chip can have one or more die carried within it. A die is not visible as it is encased in the protective material of the processor. The die consists of the semi-conductive substrate and is a discrete electrical element within the processor. A die is the most difficult portion of a processor to define, in my opinion, as it is completely invisible to anyone unless they break apart a processor and even so they are extremely difficult to see because of their size and density.

A CPU, or Central Processing Unit, is, and has been, generally tied to a die. One die contains one single CPU. A die and a traditional CPU are, roughly, synonymous. Technically an important difference remains because a die can contain components in addition to the CPU such as support processing. In a more general sense, a die can contain other types of integrated circuits other than a CPU so the two words are not the same thing even though they effectively are when we are only discussing general use processors – CPUs. Strangely it is at this level that we have achieved the term CPU used so commonly but so extremely misunderstood.

Within a single CPU there can be one or more processing cores. A core is the real workhorse of the processor stack. It is within a core that the actual processing work is done. It is most common, today, for a CPU to contain only one core. There is a common misconception that this is not the case due to marketing efforts to convince people otherwise. Internal processor architecture should not be used as a marketing tool as it is simply confusing and misleading. Only a holistic view of processor performance characteristics can provide adequate comparisons when deciding on a processing platform. No single architectural element will have an impact large enough to be usable as a determining factor in processor selection. But more importantly it is not feasible for anyone who is not a chip architect with a solid grounding in IC design concepts to even remotely grasp the intricacies involved in the design of a microprocessor.

In traditional processors, like the Pentium III, there is one core per CPU. This is very simple. In many modern processors such as the Intel Core or the Intel Core 2 there is still only one core per CPU while there are multiple CPUs per processor (each CPU is on a discrete die within the processor.) So an Intel Core 2 Duo would be a single processor with two die each with one CPU each with one core. This gives a total of two cores per processor. It is multi-core as well as being multi-CPU. Technically the term multi-core should not apply here as that is only useful in a different and important context. In the AMD Opteron processor we see a single processor with a single die and single CPU with two cores within that single CPU. In this case we have a multi-core single-CPU configuration. This is a true multi-core processor. Multi-core within a single die/CPU is an important distinction because it varies the ability for components to communicate amongst themselves. The most confusing thing here is that Intel product is named “Core” while being based on multi-CPU technology. This has lead to a proliferation in the misuse of the term core.

Cores are still an extremely important component to use in normal system discussions, however. Cores are discrete processing elements and therefore represent a very important look at our computers. By looking at cores we can see how many independent parallel actions can be taken by the processors at one time. This is very important for understanding the scaling and capacity abilities of our computers. A computer can only truly parallelize to the extent of its “core” capacity.

The final layer of our stack that we need to examine is that of the multithread (a.k.a. Hyper-Thread, SuperThread, etc.) The most well known example of this is Intel’s implementation of such used in their Pentium 4 derived XEON processors. In current use the Sun UltraSparc T family of processors are the poster-children for multithreading. Multithreading does not truly add additionally parallelism to the processing structure but it can be used, under certain loads, to make the processing pipeline more efficient and to push multiple threads of execution into the processor roughly simultaneously. Multithreading is complicated but in the absolutely simplest terms (and possibly the most useful to the layman looking to grasp the correct use of this technology) it can be though of as allowing the processor to manage thread execution and scheduling instead of leaving this solely to the operating system. In reality what is performed is vastly most complicated than this.

Multithreading is useless for single-threaded workloads and its mere presence will degrade performance. Multithreading is most useful for highly threaded workloads. It is currently seeing a lot of positive use in the areas of web servers and databases. To transfer decision making from the operating system to the multithreading portion of the processor, an MT processor presents each of its thread processors to the operating system as a separate “logical processor”. It is at this point that we finally see the concept of processor as viewed by the operating system. This “logical processor” is what we view in Microsoft’s PerfMon or TaskMgr or in top on Linux. Often this is what we think of as being the processor.

Now that we have been bombarded with terms, layers and models we will look at a few examples to help determine how we should approach the classification of processors. We will look at the HP DL360 G2, the HP DL585 G2, the HP DL580 G4, the HP and the SunFire T2000.

In our first example we will look at the very traditional and standard Hewlett-Packard/Compaq Proliant DL360 G2. This server has a single motherboard containing two processor sockets. Each socket accepts one Intel Pentium III-S processor (up to 1.4GHz.) At this level we can identify this server as a true two-way server. Each Pentium III-S processor contains a single die / CPU. Each CPU has one core and each core is natively threaded with no multithreading capabilities. So, in total, this server is a two-way server with two processors, two CPUs, two cores and two logical processors to present to the operating system. Very simple, very straightforward. Just as we expect a computer to behave.

Our second example is the Hewlett-Packard Proliant DL585 G2. This server has four processor sockets on its motherboard making it a true four-way server. Each socket can hold an AMD Rev F Opteron Dual-Core processor. Each Opteron, in this scenario, has a single die with a single CPU. Each CPU has two cores and each core has only the native thread handler providing a total of one logical processor per core. So our total is four-way, four die / CPU, eight core and eight logical processors presented to the operating system.

Our third example is the Hewlett-Packard Proliant DL580 G4. The Proliant DL580 G4 has a four socket motherboard capable of holding four Dual-Core Intel XEON 7000 series processors. This, like the DL585 G2, is a true four-way server when fully populated. Each XEON 7000 processor contains dual dies / CPUs and each CPU contains one core for a total of two cores per processor. Each core has a single native thread handler. So our total is four-way, eight die / CPU, eight core and eight logical processors presented to the operating system.

My desktop example is the Hewlett-Packard Compaq DeskPro d530. This desktop unit has the option of using the Intel Pentium 4 HyperThread processor which is what makes it interesting for our purposes. We will use this processor in our example. The DeskPro d530 has a motherboard that supports a single Pentium 4 (or Celeron 4) processor. Like most desktops this is a one-way machine. Each Pentium 4 processor has a single die / CPU with a single execution core. Each core on a traditional Pentium 4 (or Celeron 4) can execute just a single thread but, in our example, we will use the HyperThread version of the P4 which can handle two simultaneous threads presenting two logical processors to the operating system. So we have a one-way desktop with a single processor with a single CPU containing a single core with two mulithread handlers presenting two logical processors.

To make this analysis more complicated we must also be aware that because of single thread performance problems on the Pentium 4 HT platform it was very common for HyperThreading to be disabled on this processors through a BIOS setting. In these cases the threading model returns to native and only a single logical processor is presented to the operating system. This is the only example, of which I am aware, of a processor having a selectable number of presentable logical processors. The efficacy of using the HyperThread features was based upon operating system and load characteristics. For example, Windows 98SE or ME running on the d530 could not even see the second logical processor because it only has a uni-processor kernel option. So HyperThreading is not even possible. With Windows 2000 or XP both logical processors were visible and usable but some workloads, such as most video games at the time, could not take advantage of it while many business workloads would. Each user would have to determine which mode made the most sense for them adding to the complexity of the situation.

Our final example is the Sun SunFire T2000 server. The SunFire T2000 is a single socket motherboard designed to hold one UltraSparc T processor. This is a true one-way server. Each UltraSparc T processor has a single die / CPU. Each CPU contains either four, six or eight cores depending on the purchased configuration – we will use eight in our example. Each of these eight cores has four thread handlers. In this machine we therefore see a one-way server with a single processor with a single CPU containing eight cores and a total of thirty-two simultaneous multithreads being presented to the operating system as thirty-two logical processors.

As computer systems continue to increase the number of logical processors being presented to the operating system the importance of efficient process and thread handling by the operating system kernel will continue to become more and more important.  Many traditional systems have not been able to handle multi-processor situations very efficiently, if at all, but today with the number of available logical processors skyrocketing even in desktops the need for good process and thread handling across a potentially large number of logical processors is extremely important.

As you can see the issue of determining the number of processors, cores, CPUs, etc. is extremely difficult.  It is clear why people have become confused and why marketing is playing such a significant role in determining the public’s perceptions of these architectural components.  The most important components to keep clear are the counts for way, processor, core and logical processor (virtual processor, processing thread, execution engine, etc.)  Underlying component issues, while important to be semantically correct and to understand the working of processors, are still underlying components and should not be thought of as being the defining characteristics of our computer systems today.

Posted in Essays, Tech