Conversation
mustansir14
left a comment
There was a problem hiding this comment.
Thanks for fixing this long overdue bug!
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c6ed135. Configure here.
gaborbernat
left a comment
There was a problem hiding this comment.
Validated the fix end-to-end against a real internal repo (DCA/PreppyService preppy/modules/config.py) that had 4 identical [FAIL] values at lines 87, 95, 103, 111.
Before (upstream/main): all 4 reported as line 87
After (this PR): correctly reported as 87, 95, 103, 111
Tested with trufflehog filesystem --json --config=... --include-detectors=CustomRegex --no-verification. Both the synthetic case (3 identical secrets) and the real-world case pass.

Summary
When the same secret appears multiple times in a chunk, every finding was reported with the line number of the first occurrence. This fixes it so each finding reports the line where it actually sits.
Closes #2502
Root cause
FragmentLineOffsetinpkg/engine/engine.gousesbytes.Cut(chunk.Data, secret), which always splits at the first match. Because eachdetectors.Resultonly carries the secret bytes (not its byte position in the chunk), every duplicate result went through this first-match lookup and inherited the same line number.Duplicates survive to this function under normal scans.
CleanResultsdeduplication only runs under--only-verified, or when a detector'sShouldCleanResultsIrrespectiveOfConfiguration()returns true. By default, any detector that emits one result per regex match hits the bug.Fix
Localized to the engine. No detector changes required.
pkg/detectors/detectors.go: add a privatechunkOffset/chunkOffsetSetpair toResultwithSetChunkOffset/ChunkOffset/HasChunkOffsetaccessors.pkg/engine/engine.go: addAssignDuplicateLineOffsets(chunk, results), which groups results by secret value, walks the chunk withbytes.Indexto find each successive occurrence, and stamps the offset onto the corresponding result. Called once per result batch insidedetectChunk, before results are dispatched. Unique secrets are skipped (zero overhead).FragmentLineOffset: gains a fast path. Whenresult.HasChunkOffset()is true, compute the line directly from the pre-assigned offset and run the ignore-tag check against that occurrence's line. The originalbytes.Cutlogic is preserved as a fallback for any caller that didn't go throughAssignDuplicateLineOffsets, so the change is backward compatible.pkg/detectors/datadogapikey/datadogapikey_test.goandpkg/custom_detectors/custom_detectors_test.gocompareddetectors.Resultvalues viacmp.DiffwithIgnoreFieldsenumerating the then-existing unexported fields. Switched them tocmpopts.IgnoreUnexported(detectors.Result{})so they tolerate the new private fields and are future-proof against similar additions.A side effect worth calling out: the
trufflehog:ignorecheck now runs per-occurrence instead of always inspecting the first occurrence's line. That's a real behavioral improvement and is covered explicitly by a test.Why not fix it in detectors?
Threading byte offsets through every detector's
FromDatawould touch 700+ implementations and require switching most of them fromregex.FindAllStringSubmatchtoFindAllStringIndex. The engine is the single place in the pipeline that has both the full chunk data and the full result batch, so a localized engine-level fix is the minimal correct change.Performance
Resultstruct: 160 → 176 bytes (+16 bytes for the int64 + bool + alignment padding). Results are short-lived per-chunk, so there is no meaningful memory pressure.AssignDuplicateLineOffsets: O(N) with one map allocation per call. The innerbytes.Indexloop only runs for groups with duplicates.End-to-end verification
Reproduced the reporter's scenario against a fresh filesystem scan using a
CustomRegexdetector and a file containingFAKE_SECRET_ABC123XYZon lines 2, 5, and 8:line=2, line=2, line=2line=2, line=5, line=8Test plan
TestFragmentLineOffset_DuplicateSecrets: regression test for the bug (fails without the fix, passes with it)TestAssignDuplicateLineOffsets: unit test covering unique secrets, duplicates, and result orderingTestFragmentLineOffset_DuplicateSecretsWithIgnoreTag: verifies per-occurrencetrufflehog:ignorehandlingTestFragmentLineOffsetandTestFragmentLineOffsetWithPrimarySecret*still pass (fallback path unchanged)go test ./pkg/engine/...andgo test ./pkg/detectors/...passCustomRegexdetector matches the expected line numbersChecklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Touches core engine result line-number assignment by adding per-result chunk offsets; incorrect offset calculation could shift reported lines or ignore-tag behavior, but changes are localized and well-covered by new tests.
Overview
Fixes incorrect line-number reporting when the same secret appears multiple times within a single chunk by precomputing and storing a per-result byte offset.
detectors.Resultgains private chunk-offset state,detectChunknow callsAssignDuplicateLineOffsetsbefore dispatch, andFragmentLineOffsetuses the offset (when set) to compute the correct occurrence’s line and applytrufflehog:ignoreper occurrence. Tests are added for duplicate-secret line offsets (including ignore-tag cases), and a couple detector tests switch tocmpopts.IgnoreUnexported(detectors.Result{})to tolerate the new private fields.Reviewed by Cursor Bugbot for commit 485455a. Bugbot is set up for automated code reviews on this repo. Configure here.