Step 3

Data preparation

Build the dataset before you calculate.
Define who is covered, which pay component is considered, and which source data supports the reporting extract.

Start with the analysis dataset

Gender pay gap calculation needs more than payroll totals. It needs a complete dataset covering the employees in scope, the pay components in scope, and the source fields needed to connect pay data to the reporting structure.

01

Who is covered?

Start by fixing the employee population as of a specific date. The calculation should cover employees in scope for the reporting period, with clear rules for entities, countries, joiners, leavers, leave cases, working-time patterns, and contract types.

02

Which pay is considered?

Identify the relevant pay components before extracting data. The analysis needs the wage or salary and, where relevant, complementary or variable components using actual paid values.

03

Which context fields are needed?

Prepare the fields needed to interpret results later: job function, grade, location, tenure, working time, qualifications, experience, performance, and other documented pay-relevant factors.

Map pay before extracting data

The Directive defines pay broadly, so the dataset should not stop at base salary. Define which pay components are considered first: it should say what each pay component is, whether it is included and a number or an access (participates / does not participate) variable, and how it is treated.

Base wage or salary

Fixed pay

  • Base wage or salary
  • Minimum or collectively agreed pay
  • Fixed allowances that form part of the base wage or salary
Complementary or variable components

Additional pay

  • Bonuses and incentives
  • Commissions
  • Overtime pay
  • Shift or location allowances
  • Long-term incentives
  • Severance or termination payments
  • Sign-on bonuses
  • Statutory pay continuation
  • Occupational pensions
  • Employee benefits

Consolidate and clean the source data

The next step is consolidation. HR, payroll, reward, and job architecture data often sit in different systems. The objective is one reproducible analysis data set.

HR master data

Employee ID, gender, entity, contract type (fixed / indefinite), working time, tenure, and status.

Payroll

Actual pay to derive gross annual pay, hourly pay, and supplementary pay components.

Reward systems

Bonuses, incentives, commissions, equity, allowances, eligibility, plan participation and incentive performance.

Job and people data

Grade, level, job family, job profile, location, gender-neutral performance assessment, experience in role, skills, and qualifications.

  • Identify the source for each field instead of reconciling by hand later.
  • Consolidate the sources into one dataset with one row per employee.
  • Clean duplicates, missing values, inconsistent gender coding, and unrealistic annual pay or hours values.
  • Separate zero payments from missing payments, especially for variable or complementary pay.
  • Document any exclusions or privacy constraints that affect the reporting extract.
  • Freeze the extract used for reporting so the result can be reproduced.

Only calculate once the extract is defensible

No formula can make up for weak data. Before running the pay gap analysis, make sure the extract, pay mapping, source logic, and governance are ready for review.

  • The population rule are defined and coherent.
  • Actual paid values are not mixed with target values.
  • Annual, hourly, and full-time-equivalent values are not used interchangeably. Analysis is based upon full-time-equivalent values.
  • Every pay component is mapped to base wage or salary, variable or complementary pay, is assessed on a can / can not access basis, or is out of scope.