wrote:
>
>The clock counts are 19, 33 and 39 for single, double and extended prec
>...
> cycles = ((bits_in_mantissa + 1) >> 1) + 6;
Assuming 24, 53, and 64 bits in the respective mantissas (counting the
leading bit), this formula gives 18, 33, 38 cycles. The formula
bits/2 + 7 gives the desired result.
>(The first iteration on the SRT radix-4 divider will only generate a single
>bit, while all subsequent cycles gives two more bits.)
I think I know what you mean, but it should be pointed out that the
first iteration can generate two bits. Consider the case
1.1111/1.0000, for which P > 5/3D and hence for which the plot on page
6 of the Intel white paper shows that digit +2 is mandatory. The
example in Figure 4-2 indicates that at the first iteration the
operands are aligned. (rp(0) in the figure seems to be a typo, I think
it should be P(0).) Hence 1.1111/1.0000 (or any dividend above
1.10000) forces this case to produce the quotient 10, two bits, at the
first iteration.
I think what you mean is that ultimately (possibly not until the very
end if the next 30 quotient digits are 0) that leading 1 is wiped out by
a subtraction. Hence it should not be counted as one of the developed
quotient bits for the purpose of stopping when you have enough quotient
bits.
But then the case 1.0000/1.1111 (more generally, whenever D
References: <013873XIAEGMJREVNGYB@abq-ros.com> <3er7tr$o8f@vkhdsu01.hda.hydro.com>
NNTP-Posting-Host: sunburn.stanford.edu
In article <3er7tr$o8f@vkhdsu01.hda.hydro.com>,
Terje Mathisen wrote:
>
>The clock counts are 19, 33 and 39 for single, double and extended prec
>...
> cycles = ((bits_in_mantissa + 1) >> 1) + 6;
Assuming 24, 53, and 64 bits in the respective mantissas (counting the
leading bit), this formula gives 18, 33, 38 cycles. The formula
bits/2 + 7 gives the desired result.
>(The first iteration on the SRT radix-4 divider will only generate a single
>bit, while all subsequent cycles gives two more bits.)
I think I know what you mean, but it should be pointed out that the
first iteration can generate two bits. Consider the case
1.1111/1.0000, for which P > 5/3D and hence for which the plot on page
6 of the Intel white paper shows that digit +2 is mandatory. The
example in Figure 4-2 indicates that at the first iteration the
operands are aligned. (rp(0) in the figure seems to be a typo, I think
it should be P(0).) Hence 1.1111/1.0000 (or any dividend above
1.10000) forces this case to produce the quotient 10, two bits, at the
first iteration.
I think what you mean is that ultimately (possibly not until the very
end if the next 30 quotient digits are 0) that leading 1 is guaranteed
to be wiped out by a subtraction. Hence it should not be counted as
one of the developed quotient bits for the purpose of stopping when you
have enough quotient bits.
But if that's what you mean, then the other extreme 1.0000/1.1111 (more
generally, whenever D
References: <3f9qlq$9ds@ornews.intel.com> <3fh47f$mgq@news.jf.intel.com>
NNTP-Posting-Host: sunburn.stanford.edu
In article ,
Anil Das wrote:
>In article <3fh47f$mgq@news.jf.intel.com>, proussel@pdx144.intel.com (Patrice
>Roussel) writes:
>> All the entries of the guess PLA of the Pentium were exercized (something
>> you
>> can do with few thousand (dividend,divisor) pairs). Unfortunately the
>> problem was that some entries were missing :-(.
>
> What is meant by `missing' here? If dividend/divisor pairs
>that use those entries where tested, they will come out as 0 instead
>of 2, and the problem would have been found, right?
This depends on how they came to be 0. The Intel white paper
attributes the missing entries to an error in a script written to
download entries into a PLA, in which case such a test would presumably
have caught this.
However at a lecture I gave on the Pentium bug a week ago at Stanford,
William Kahan who was in the audience said that the five entries were
missing because it had been proved (erroneously) that they were never
accessed by the division algorithm, like all the entries above them in
the table. Obviously one cannot test any of the inaccessible entries
by running the algorithm, so if you believe these five are inaccessible
then you will of course not provide a test case for them.
Having heard such conflicting accounts from Intel and Kahan, neither of
whom I am competent to contradict, I am now thoroughly mystified as to
the real origin of the bug. At this point the only apparently relevant
fact about the table that I am 100% certain of is the following.
In the five erroneous columns of the PD-plot (these being the ones we
can measure, thanks to the bug), the six thresholds (three each in the
positive and negative halves) are set to values I have tabulated
below. The relevant fact is that these values would be perfectly
symmetric about -1 had the erroneous threshold (at the top of the
positive half) been set correctly, namely one cell higher. This higher
threshold would then not illegally penetrate the 8/3 D line (shown in
the Intel white paper), the root cause of the problem.
(The reason -1 is the appropriate midpoint is that sampling truncates
the partial remainder towards -oo in both halves, by an amount between
0 and 2 thanks to the redundant carry-save representation of the
partial remainder. Had an irredundant representation been used the
truncation would have been by an amount between 0 and 1 and the
appropriate midpoint would have been -0.5. Had the sampled partial
remainder been truncated towards oo instead of -oo the midpoint would
have been positive instead of negative.)
In order for the bug to have been caused by a faulty proof, both those
who wrote the proof and those who read it would need to believe that a
correct table could have the inner two threshold pairs (between
absolute-value quotient digits 0 and 1, and between 1 and 2) perfectly
symmetric yet have the outer pair (between 2 and "3") asymmetric. I
have been unable to come up with any plausible failure mode for the
proof which would reasonably account for confidence in such a
peculiarly isolated asymmetry.
Perhaps none of the parties examining the table noticed that the inner
two pairs of thresholds were symmetric. Or perhaps some reason was
advanced for why there should be a single exception to the symmetry
that somehow everyone involved found convincing. I find neither of
these scenarios terribly plausible, and am therefore mystified as to
what form a faulty proof could have taken.
Here are the observable thresholds, originally told to me by Tim Coe
and which I have since confirmed independently for myself by verifying
that any other choice of thresholds would be inconsistent with the
observed behavior of the buggy Pentium. I've laid this out here as a
transposed PD-plot, with the five observable divisors (17,20,23,26,29)
indexing five rows (oriented as columns in the white paper), and with
the entries (quotient digits) in each region of the table indexing the
columns. The entries in my table are the ranges of chopped (sampled)
partial remainder which yield the indicated quotient digits. The six
vertical bars in the table separate the seven regions and constitute
the thresholds between the regions. (An unambiguous way to realize
these as numeric thresholds might be as the mean of the two adjoining
chopped partial remainders, e.g. at D=17 the 1-2 threshold, between 11
and 12, could be considered to be set at 11.5.)
\ q = 0("-3") -2 -1 0 1 2 0("3")
D_5\---------------------------------------------------------------------
17=1.0001 ...-26 | -25..-14 | -13..-5 | -4..2 | 3..11 | 12..22 | 23...
| | | | | |
20=1.0100 ...-30 | -29..-16 | -15..-6 | -5..3 | 4..13 | 14..26 | 27...
| | | | | |
23=1.0111 ...-34 | -33..-18 | -17..-6 | -5..3 | 4..15 | 16..30 | 31...
| | | | | |
26=1.1010 ...-38 | -37..-20 | -19..-7 | -6..4 | 5..17 | 18..34 | 35...
| | | | | |
29=1.1101 ...-42 | -41..-22 | -21..-7 | -6..4 | 5..19 | 20..38 | 39...
ok ok ok ok ok ouch
The first five thresholds are marked as ok. The sixth, marked ouch, is
one too low.
Although the 11 other chopped divisors, namely D=16,18,19,21,22,24,25,
27,28,30,31, are not observable via the bug, Tim conjectures that their
thresholds are those of their "ceilings" defined as the smallest higher
buggy divisor, e.g. D=18 and 19 have the exact same thresholds as
D=20. This leaves D=30 and 31 with unmeasured thresholds, but by
extending the evident arithmetic progressions these are reasonably
conjectured to be
31=1.1111 ...-46 | -45..-24 | -23..-8 | -7..5 | 6..21 | 22..42 | 43...
With that regularity the table would only require 6 "fat" columns (one
per three chopped divisors, less one at each end) instead of 16 (one
per chopped divisor), shrinking the table to 6/16 its original size.
Sampling the divisor to determine which of the 6 columns to use would
only be done once, at setup.
In contrast, sampling the partial remainder is done at every cycle of
the algorithm, and it may well be preferable to keep separate positive
and negative halves of the table to avoid the extra time required to
.
That the actual table is asymmetric is very strong evidence that the
positive and negative halves are represented separately. (But not
conclusive evidence---a simple logic circuit could in principle have
been used to lower the top threshold by 1 for positive divisors. Such
an ad-hoc addition to the logic is however almost surely excluded on
the ground that it would serve no purpose even if the lower threshold
*had* been correct.)
Here is a slick and easily proved closed form formula for the entries
of the positive half of a *correct* table. If D (between 16 and 31) is
the chopped divisor, let d = D/3 + 1 (between 6 and 11, serving as an
index to the 6 "fat" columns), and let p be the chopped partial
remainder, p >= 0. Take the PD-plot table entry at row p and fat
column d to be given by the closed form formula
(d <= 2*p) + (2*d <= p)
where Booleans are numerically 0 if false and 1 if true (as in C).
Thus at D = 17 we have d = 17/3 + 1 = 5 + 1 = 6, and we then get for
p=2,3,11,12 the values
p d <= 2*p 2*d <= p (d<=2*p) + (2*d<=p)
----------------------------------------------------
2 0 0 0
3 1 0 1
11 1 0 1
12 1 1 2
These four examples show that this closed form formula locates the two
inner thresholds exactly where the Pentium puts them, at least for
D=17; similar calculations confirm that this formula tracks the
Pentium's inner two thresholds perfectly at the other divisors.
One can imagine ways to evaluate this formula sufficiently fast during
each division cycle as to obviate the need for a precomputed table of
its values.
This formula makes the inaccessible entries 2. If it is desired to
reduce power by setting these to something else, e.g. 0 or 3, this can
be safely accomplished using the easily proved fact that, with one
exception, they are precisely those entries satisfying 4*d <= p. Note
that this formula does *not* produce the notorious bug, which
corresponds to using the more complicated formula 4*d <= p+1 (in the
positive half), which compromises the rightmost third of each fat
column. The one exception mentioned above is for d = 11, p = 43, where
thanks to the absence of column D=32 this more complicated formula does
not produce the bug, else there would be 6 groups of problem divisors
instead of 5. However I would imagine the tiny economy of setting one
additional entry to 0 is not worth the trouble and attendant risk of
error; stick to 4*d <= p uniformly and let d=11, p=43 stay at 2.
The negative half of the table is obtained simply by reflecting the
positive half about -1, also easily proved correct (an outline of this
proof is in the parenthetical paragraph near the beginning of this
message starting "The reason").
Had these simple and easily derived formulas been known for the PD-plot
table, all this trouble would have been avoided. There is *nothing* so
effective as simplicity to avoid bugs, whether in computers,
mathematics, or any other area where embarrassing and/or expensive
mistakes are possible.
--
Vaughan Pratt FTP: boole.stanford.edu:/pub/ABSTRACTS
http://boole.stanford.edu/boole.html
=================================================================
-------------------------PENTIUM REPORT # bug74----------------------
Vaughan R. Pratt
Department of Computer Science
Stanford University
Stanford, CA 94304
pratt@cs.stanford.edu
======================================================================
Article 31190 of comp.sys.intel:
Path: Radon.Stanford.EDU!news.Stanford.EDU!Sunburn.Stanford.EDU!pratt
From: pratt@Sunburn.Stanford.EDU (Vaughan R. Pratt)
Newsgroups: comp.sys.intel
Subject: Re: Pentium Math Error
Date: 21 Jan 1995 20:03:26 GMT
Organization: Computer Science Department, Stanford University.
Lines: 34
Message-ID: <3frpae$ovk@Radon.Stanford.EDU>
References:
NNTP-Posting-Host: sunburn.stanford.edu
In article ,
Warren Rath wrote:
>
>Now ask your self how many times does the calculation of
> 4,195,835 / 3,145,727 actually come up in computing??
Once every seventeen comp.sys.intel posts, or once a minute. :)
I prefer 5/15 with a millionth knocked off top and bottom, as being
easier to remember than 4195835/3145727. 4.999999/14.999999 should
have 6 3's but the Pentium only produces 4 of them.
If you take the number 1 and randomly add, subtract, multiply, or
divide it by itself, and save the result, and keep doing this with a
mix of more 1's and these results, then in the course of an hour you
will see the bug more than a dozen times.
This is the dual experiment to Intel's. Intel takes a program
consisting of a single instruction, FDIV, sends pairs of random numbers
to it, and gets an error rate of 1 in 9 billion random pairs of
numbers. In contrast my experiment above takes a database starting
with a single number, 1, sends it to random programs, and gets an error
rate of 1 in five minutes of random computing.
This shows that the error rate depends on both the kind of programs you
use and the kind of data flowing through them. If your program
consists of just the FDIV instruction and your data is uniformly
distributed then you should be as safe as Intel says. If you have more
complicated programs and less complicated numbers, the above experiment
indicates that you could be at much greater risk than either Intel or
IBM says.
--
Vaughan Pratt FTP: boole.stanford.edu:/pub/ABSTRACTS
http://boole.stanford.edu/boole.html