Enhancements[ edit ] Enhancements to the original implementation include increasing the radix and sparsity of the adder. The radix of the adder refers to how many results from the previous level of computation are used to generate the next one. Doing so increases the power and delay of each stage, but reduces the number of required stages. In the so-called sparse Kogge—Stone adder SKA the sparsity of the adder refers to how many carry bits are generated by the carry-tree.
|Published (Last):||20 November 2017|
|PDF File Size:||20.84 Mb|
|ePub File Size:||2.52 Mb|
|Price:||Free* [*Free Regsitration Required]|
I took classes on this in school, so I had a basic understanding, but the more I thought about it, the more I realized that my ideas about how this would scale up to bit computers would be too slow to actually work. I started digging around, and even though wikipedia is usually exhaustive and often inscrutable about obscure topics, I had reached the edge of the internet.
I had to do actual research of the 20th-century kind. So come with me over the precipice and learn — in great detail — how to add numbers! Adding in binary For big numbers, addition by hand means starting on the rightmost digit, adding all the digits in the column, and then writing down the units digit and carrying the tens over. This works the same in binary, but the digits can only ever be 0 or 1, so the biggest number we can add is 1 plus 1.
That still only carries a 1, which is convenient, because it means the carry can be represented in binary just like every other digit. It gives you a bit more intuition when dealing with logical equations, which will come up later.
One way to think of it is: According to the logic table we just made, the sum should be 1 if there are an odd number of incoming 1s. XOR is the operation that matches odd inputs. And the carry should be 1 if at least two of the incoming digits are 1.
Adding in circuitry The most straightforward logic circuit for this is assuming you have a 3-input XOR gate. And if we put a bunch of them in a row, we can add any N-bit numbers together! Starting along the top, there are four inputs each of A and B, which allows us to add two 4-bit numbers. Imagine setting up 64 of those adders in a chain, so you could add two bit numbers together.
How long would it take? The circuit diagram above shows that each sum goes through one or two gates, and each carry-out goes through two.
And the carry-out of one adder becomes the carry-in for the next one. Uh oh. Carry-select adder The trick that seems most obvious to me — and the only one I thought of before doing research — was apparently invented in by Sklansky. One computes the sum with a carry-in of 0, and the other computes with a carry-in of 1. When the real carry-in signal arrives, it selects which addition to use. A mux takes two inputs and selects one or the other, based on a control signal.
In this case, each mux uses the carry-in signal to determine which adder output to use, for each of the four sum bits along the bottom , and the carry-out bit on the left. The diagram gets simpler if we make a shortcut box for a series of connected adder units, and draw each group of 4 input or output bits as a thick gray bus: Now, for example, to compute the sum of two bit numbers, we can split each number into four chunks of four bits each, and let each of these 4-bit chunks add in parallel.
For a bit adder, it would take 24 delays, because it would have 16 muxes instead of 4. Going from to 24 is a great start, and it only cost us a little less than twice as many gates! We can fuss with this and make it a little faster. If we compute only one bit at a time on the right, then two, then three, and so on as it goes left, we can shave off a few more. But… we can do better. Next time, some tricker adding methods that end up being quicker. Be sure to read part 1 before diving into this!
Both of these cases are the same whether the carry-in is 0 on 1. It will have a carry-out if it generates one, or it propagates one and the lowest bit generated one, or it propagates one and the lowest bit propagates one and the carry-in was 1. Parallel in small doses This series can go on indefinitely. We could compute each carry bit in 3 gate delays, but to add 64 bits, it would require a pile of mythical input AND and OR gates, and a lot of silicon. That adds one more gate, for a total of 4 gate delays to compute the whole 2-bit sum.
If we built a set of 4-bit adders this way — assuming a 6-way OR gate is fine — our carry-select adder could add two bit numbers in 19 gate delays: 3 for all of the carries to be generated, and 16 for the muxes to ripple down. These ripples now account for almost all of the delay. Kogge-Stone In , probably while listening to a Yes or King Crimson album, Kogge and Stone came up with the idea of parallel-prefix computation. Their paper was a description of how to generalize recursive linear functions into forms that can be quickly combined in an arbitrary order, but um, they were being coy in a way that math people do.
What they were really getting at is that these G and P values can be combined before being used. If you combine two columns together, you can say that as a whole, they may generate or propagate a carry. If the left one generates, or the left one propagates and the right one generates, then the combined two-column unit will generate a carry. The unit will only propagate a carry bit across if both columns are propagating. This is the country where cowboys ride horses that go twice as far with each hoofstep.
But seriously, it means we can compute the final carry in an 8-bit adder in 3 steps. Wait, what? Well, the numbers at the top represent the computed P and G bit for each of the 8 columns of our 8-bit adder.
The diamonds combine two adjacent sets of columns and produce a new combined P and G for the set. If this works, at the bottom, each arrow should represent the combined P and G for that column and every column to its right.
Look at the line on the far left, and trace it back up. It combines the lines from 7 and 3, and as we trace that up again, each of those combines two more units, and then again to cover all 8 columns.
The same path up should work for each column. As we saw above, each combining operation is two gates, and computing the original P and G is one more. For a bit adder, we need 6 combining steps, and get our result in 16 gate delays! The Kogge-Stone adder is the fastest possible layout, because it scales logarithmically.
Every time we add a combining step, it doubles the number of bits that can be added. It might even monopolize a lot of the chip space if we tried to build it. A nice paper from compares several adder strategies and decides that this one is the most energy-efficient because of the trade-off of speed for simplicity.
That is, it can be built easier than the Kogge-Stone adder, even though it has nearly twice as many combination steps in it. This is more than our best-case of 16 for the Kogge-Stone adder, and a bit more than our naive-case of 24 with the carry-select adder. You can see this especially in column 3. That reduces the fan-out back to 2 without slowing anything down. So if we were to combine this strategy with the carry-select strategy from last time, our carry bits could start rippling across the adder units before each unit finishes computing the intermediate bits.
When a carry-select adder is used with k units, the ripple delay is k plus the time it takes to get a carry-out from the first unit. So if we split our bit adder into 8 8-bit Brent-Kung adders, and combine those into a carry-select adder, the 8-bit adders will compute their carry-out bits in 9 gate delays, after which the carry bits ripple through the muxes for 7 gate delays, for a total of The sum bits are available after 14 gate delays, in plenty of time. So we got it down to 16 total, and this time in a pretty efficient way!
Adding numbers: Proof that humans can make anything complicated, if they try hard enough. There are a bunch of other historical strategies, but I thought these were the most interesting and effective.