Commit 3eb5c49f authored by Ian Rogers's avatar Ian Rogers Committed by Arnaldo Carvalho de Melo
Browse files

perf test: Hybrid improvements for metric value validation test



On my alderlake I currently see for the "perf metrics value validation" test:

```
Total Test Count:  142
Passed Test Count:  139
[
Metric Relationship Error:      The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
                        is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_frontend_bound, tma_frontend_bound]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error:      The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
                        is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_backend_bound, tma_backend_bound]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
Metric Relationship Error:      The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
                        is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [tma_retiring, tma_retiring]
                        Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
Metric validation return with erros. Please check metrics reported with errors.
```

I suspect it is due to two metrics for different CPU types being
enabled. Add a -cputype option to avoid this. The test still fails with:

```
Total Test Count:  115
Passed Test Count:  114
[
Wrong Metric Value Error:       The collected value of metric ['tma_l2_hit_latency']
                        is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
                        but expected value range is [0, 100]]
Metric validation return with errors. Please check metrics reported with errors.
```

which is a reproducible genuine error and likely requires a metric fix.

Signed-off-by: default avatarIan Rogers <irogers@google.com>
Tested-by: default avatarThomas Falcon <thomas.falcon@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20250512184700.11691-2-irogers@google.com


Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 7f84f674
Loading
Loading
Loading
Loading
+9 −3
Original line number Diff line number Diff line
@@ -35,7 +35,8 @@ class TestError:


class Validator:
    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='', workload='true', metrics=''):
    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='',
                 workload='true', metrics='', cputype='cpu'):
        self.rulefname = rulefname
        self.reportfname = reportfname
        self.rules = None
@@ -43,6 +44,7 @@ class Validator:
        self.metrics = self.__set_metrics(metrics)
        self.skiplist = set()
        self.tolerance = t
        self.cputype = cputype

        self.workloads = [x for x in workload.split(",") if x]
        self.wlidx = 0  # idx of current workloads
@@ -377,7 +379,7 @@ class Validator:

    def _run_perf(self, metric, workload: str):
        tool = 'perf'
        command = [tool, 'stat', '-j', '-M', f"{metric}", "-a"]
        command = [tool, 'stat', '--cputype', self.cputype, '-j', '-M', f"{metric}", "-a"]
        wl = workload.split()
        command.extend(wl)
        print(" ".join(command))
@@ -443,6 +445,8 @@ class Validator:
                if 'MetricName' not in m:
                    print("Warning: no metric name")
                    continue
                if 'Unit' in m and m['Unit'] != self.cputype:
                    continue
                name = m['MetricName'].lower()
                self.metrics.add(name)
                if 'ScaleUnit' in m and (m['ScaleUnit'] == '1%' or m['ScaleUnit'] == '100%'):
@@ -578,6 +582,8 @@ def main() -> None:
    parser.add_argument(
        "-wl", help="Workload to run while data collection", default="true")
    parser.add_argument("-m", help="Metric list to validate", default="")
    parser.add_argument("-cputype", help="Only test metrics for the given CPU/PMU type",
                        default="cpu")
    args = parser.parse_args()
    outpath = Path(args.output_dir)
    reportf = Path.joinpath(outpath, 'perf_report.json')
@@ -586,7 +592,7 @@ def main() -> None:

    validator = Validator(args.rule, reportf, debug=args.debug,
                          datafname=datafile, fullrulefname=fullrule, workload=args.wl,
                          metrics=args.m)
                          metrics=args.m, cputype=args.cputype)
    ret = validator.test()

    return ret
+11 −6
Original line number Diff line number Diff line
@@ -16,11 +16,16 @@ workload="perf bench futex hash -r 2 -s"
# Add -debug, save data file and full rule file
echo "Launch python validation script $pythonvalidator"
echo "Output will be stored in: $tmpdir"
$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}"
for cputype in /sys/bus/event_source/devices/cpu_*; do
	cputype=$(basename "$cputype")
	echo "Testing metrics for: $cputype"
	$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}" \
		-cputype "${cputype}"
	ret=$?
	rm -rf $tmpdir
	if [ $ret -ne 0 ]; then
	echo "Metric validation return with erros. Please check metrics reported with errors."
		echo "Metric validation return with errors. Please check metrics reported with errors."
	fi
done
exit $ret