In hist, a histogram is collection of Axis objects and a
storage. Based on boost-histogram’s
Axis, hist support six types of axis,
StrCategory with additional names and labels.
Names are pretty useful for some histogramming shortcuts, thus greatly facilitate HEP’s studies. Note that the name is the identifier for an axis in a histogram and must be unique.
import hist from hist import Hist
axis0 = hist.axis.Regular(10, -5, 5, overflow=False, underflow=False, name="A") axis1 = hist.axis.Boolean(name="B") axis2 = hist.axis.Variable(range(10), name="C") axis3 = hist.axis.Integer(-5, 5, overflow=False, underflow=False, name="D") axis4 = hist.axis.IntCategory(range(10), name="E") axis5 = hist.axis.StrCategory(["T", "F"], name="F")
Histogram is consisted with various axes, there are two ways to create a histogram, currently. You can either fill a histogram object with axes or add axes to a histogram object. You cannot add axes to an existing histogram. Note that to distinguish these two method, the second way has different axis type names (abbr.).
# fill the axes h = Hist(axis0, axis1, axis2, axis3, axis4, axis5)
# add the axes using the shortcut method h = ( Hist.new.Reg(10, -5, 5, overflow=False, underflow=False, name="A") .Bool(name="B") .Var(range(10), name="C") .Int(-5, 5, overflow=False, underflow=False, name="D") .IntCat(range(10), name="E") .StrCat(["T", "F"], name="F") .Double() )
Hist adds a new
flow=False shortcut to axes that take
AxesTuple is a new feature since boost-histogram 0.8.0, which provides you free access to axis properties in a histogram.
assert h.axes.name == axis0.name assert h.axes.label == axis1.name # label will be returned as name if not provided assert all(h.axes.widths == axis2.widths) assert all(h.axes.edges == axis3.edges) assert h.axes.metadata == axis4.metadata assert all(h.axes.centers == axis5.centers)
There are several axis types to choose from.
- hist.axis.Regular(bins, start, stop, name, label, *, metadata='', underflow=True, overflow=True, circular=False, growth=False, transform=None)
The regular axis can have overflow and/or underflow bins (enabled by
default). It can also grow if
growth=True is given. In general, you
should not mix options, as growing axis will already have the correct
flow bin settings. The exception is
underflow=False, overflow=False, which
is quite useful together to make an axis with no flow bins at all.
There are some other useful axis types based on regular axis:
- hist.axis.Regular(..., circular=True)
This wraps around, so that out-of-range values map back into the valid range in a circular fashion.
Regular axis: Transforms#
Regular axes support transforms, as well; these are functions that convert from an external,
non-regular bin spacing to an internal, regularly spaced one. A transform is made of two functions,
forward function, which converts external to internal (and for which the transform is usually named),
inverse function, which converts from the internal space back to the external space. If you
know the functional form of your spacing, you can get the benefits of a constant performance scaling
just like you would with a normal regular axis, rather than falling back to a variable axis and a poorer
scaling from the bin edge lookup required there.
You can define your own functions for transforms, see Transform. If you use compiled/numba functions, you can keep the high performance you would expect from a Regular axis. There are also several precompiled transforms:
- hist.axis.Regular(..., transform=hist.axis.transform.sqrt)
This is an axis with bins transformed by a sqrt.
- hist.axis.Regular(..., transform=hist.axis.transform.log)
Transformed by log.
- hist.axis.Regular(..., transform=hist.axis.transform.Power(v))
Transformed by a power (the argument is the power).
- hist.axis.Variable([edge1, ..., ]name, label, *, metadata="", underflow=True, overflow=True, circular=False, growth=False)
You can set the bin edges explicitly with a variable axis. The options are mostly the same as the Regular axis.
- hist.axis.Integer(start, stop, name, label, *, metadata='', underflow=True, overflow=True, circular=False, growth=False)
This could be mimicked with a regular axis, but is simpler and slightly faster. Bins are whole integers only, so there is no need to specify the number of bins.
One common use for an integer axis could be a true/false axis:
bool_axis = hist.axis.Integer(0, 2, underflow=False, overflow=False)
Another could be for an IntEnum (Python 3 or backport) if the values are contiguous.
- hist.axis.IntCategory([value1, ..., ]name, label, metadata="", growth=False)
You should put integers in a category axis; but unlike an integer axis, the integers do not need to be adjacent.
One use for an IntCategory axis is for an IntEnum:
import enum class MyEnum(enum.IntEnum): a = 1 b = 5 my_enum_axis = hist.axis.IntEnum(list(MyEnum), underflow=False, overflow=False)
You can sort the Categorty axes via
h = Hist( axis.IntCategory([3, 1, 2], label="Number"), axis.StrCategory(["Teacher", "Police", "Artist"], label="Profession"), ) # Sort Number axis increasing and Profession axis decreasing h1 = h.sort("Number").sort("Profession", reverse=True)
- hist.axis.StrCategory([str1, ..., ]name, label, metadata="", growth=False)
You can put strings in a category axis as well. The fill method supports lists or arrays of strings to allow this to be filled.
Axes have a variety of methods and properties that are useful. When inside a histogram, you can also access
these directly on the
hist.axes object, and they return a tuple of valid results. If the property or method
normally returns an array, the
axes version returns a broadcasting-ready version in the output tuple.