What OLS answers, and what it does not
Pay-gap analysis usually moves from reporting the gap, to estimating the residual, to decomposing what contributes to the difference.
Directive reporting - the actually observed pay gap
Mostly unadjusted descriptive metrics; 9.1(g) narrows the comparison to same/equal-value groups.
OLS regression - estimate the unexplained pay gap
One combined model tests whether gender is still associated with pay after adjusting for pay-related factors. OLS regression is the analysis method used by most model providers.
Oaxaca-Blinder - decompose the pay gap
Male and female models split the gap into a detailed explained and an unexplained component.
An OLS residual is not proof of discrimination, but it is a priority for
investigation.
If a variable reduces the gender coefficient, the model has found a statistical
explanation. That does not automatically make the explanation fair, objective, or
gender-neutral.
The regression model explained
log(pay) = gender + grade + job family + location + tenure + performance + error The gender coefficient is the main result for adjusted pay gap analysis as it would represent a pay decrease for women.
Start with the pay outcome you want to explain.
Adjust for pay-related factors.
Gender coefficient: roughly a 4.1% adjusted pay gap.
log(pay)
The pay measure being explained. Log pay makes coefficients easier to read as approximate percentage differences.
gender
The variable of interest. In a common setup, Woman = 1 and Man = 0, so men are the reference group.
controls
Measured pay-related factors such as grade, job family, location, tenure, performance, and working time.
error
The part of pay not explained by the variables in the model.
Data and Excel setup
Start from the data preparation logic: one clean analysis dataset, one row per employee, and documented sources for pay, job, and people fields.
| Field | Use in the model |
|---|---|
| Employee ID | One anonymised row per employee. |
| Gender | Consistent coding, 0 = Male, 1 = Female. |
| Pay measure | Base wage or salary as annual, hourly, and adjusted to 1 FTE. The same approach applies to variable and complementary pay. |
| Log pay | For Excel derived with =LN(pay). |
| Job data | Grade, job family, employee group, location, level, or job profile. |
| People factors | Tenure, qualifications, performance, ... |
Run the model and read the output
Analysis ToolPak
- Enable the Analysis ToolPak.
- Open Data > Data Analysis > Regression.
- Set Input Y Range to the log-pay column.
- Set Input X Range to gender plus all of the control variables.
- Tick Labels if the first row contains headers.
- Run the model and preserve the output with the dataset version.
Advanced Excel users can also use
=LINEST(Y_range, X_range, TRUE, TRUE), but the ToolPak output
is easier to read and preserve for a first model.
Read the output as a set of signals. None of them is a verdict on its own, but together they tell you whether the model is usable and where to investigate. The individual outputs can be interpreted as follows:
Higher means the model explains more pay variation between genders.
This should match the expected analysis population. If it does not, fix the dataset before reinterpreting results.
The absolute size shows strength of association with pay; the sign shows direction. The gender coefficient is the adjusted gap estimate.
Lower p-values give stronger evidence that the estimate is not zero.
Narrower intervals are easier to interpret. Wide intervals mean the estimate is uncertain.
Interpreting the gender coefficient
In a log-pay model, the gender coefficient is the adjusted pay gap estimate. If Woman = 1 and Man = 0, a negative coefficient means women are paid less than comparable men in the model.
=EXP(coefficient)-1 For -0.041, =EXP(-0.041)-1 gives approximately -4.0%.
The p-value suggests the estimated gender coefficient is unlikely to be zero under the model assumptions. That does not automatically prove discrimination, but it is a visible residual gap that requires explanation and investigation.
Checking the model
After reading the coefficient, stress-test the model before drawing conclusions. The result should not depend on a data artefact, a small-sample problem, or an unstable specification.
Data checks
- Population, pay measure, and exclusions are documented.
- Pay values are positive and plausible.
- Missing values and zero values are separated.
- Dummy variables omit one reference category.
Model checks
- There are enough employees for the number of variables.
- Key categories are not too small to interpret.
- Outliers are reviewed before conclusions are drawn.
- The gender coefficient does not move sharply when one variable is added or removed.
- Identify whether the residual gap is concentrated in particular employee groups.