## Description

A nonzero finite Float represents a multi-precision floating point number

sign × mantissa × 2**exponent

with 0.5 <= mantissa < 1.0, and MinExp <= exponent <= MaxExp. A Float may also be zero (+0, -0) or infinite (+Inf, -Inf). All Floats are ordered, and the ordering of two Floats x and y is defined by x.Cmp(y).

Each Float value also has a precision, rounding mode, and accuracy. The precision is the maximum number of mantissa bits available to represent the value. The rounding mode specifies how a result should be rounded to fit into the mantissa bits, and accuracy describes the rounding error with respect to the exact result.

Unless specified otherwise, all operations (including setters) that specify a *Float variable for the result (usually via the receiver with the exception of MantExp), round the numeric result according to the precision and rounding mode of the result variable.

If the provided result precision is 0 (see below), it is set to the precision of the argument with the largest precision value before any rounding takes place, and the rounding mode remains unchanged. Thus, uninitialized Floats provided as result arguments will have their precision set to a reasonable value determined by the operands and their mode is the zero value for RoundingMode (ToNearestEven).

By setting the desired precision to 24 or 53 and using matching rounding mode (typically ToNearestEven), Float operations produce the same results as the corresponding float32 or float64 IEEE-754 arithmetic for operands that correspond to normal (i.e., not denormal) float32 or float64 numbers. Exponent underflow and overflow lead to a 0 or an Infinity for different values than IEEE-754 because Float exponents have a much larger range.

The zero (uninitialized) value for a Float is ready to use and represents the number +0.0 exactly, with precision 0 and rounding mode ToNearestEven.

## Examples

// Initialize values we need for the computation. two := new(big.Float).SetPrec(prec).SetInt64(2) half := new(big.Float).SetPrec(prec).SetFloat64(0.5)

// Initialize values we need for the computation. two := new(big.Float).SetPrec(prec).SetInt64(2) half := new(big.Float).SetPrec(prec).SetFloat64(0.5) // Use 1 as the initial estimate.

// Use 1 as the initial estimate. x := new(big.Float).SetPrec(prec).SetInt64(1) // We use t as a temporary variable. There's no need to set its precision

## Float is referenced in 50 repositories

**github.com/golang/go**

- 87 references in src/math/big/float_test.go
- 83 references in src/math/big/float.go
- 12 references in src/math/big/floatconv.go
- 7 references in src/go/constant/value.go
- 7 references in src/math/big/ftoa.go

**github.com/johnny-morrice/godelbrot**

- 14 references in internal/bigbase/bigbasenumerics.go
- 11 references in zoom_test.go
- 10 references in internal/bigbase/bigcomplex.go
- 7 references in config.go
- 7 references in request.go

**github.com/robpike/ivy**

- 15 references in value/unary.go
- 10 references in value/const.go
- 10 references in value/sin.go
- 8 references in value/asin.go
- 8 references in value/binary.go

...