Skip to content

x-pack/filebeat/input/awss3: fix document ID construction when using csv decoder #42019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 18, 2024

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Dec 12, 2024

Proposed commit message

See title.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@efd6 efd6 added Filebeat Filebeat bugfix Team:Security-Service Integrations Security Service Integrations Team backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Dec 12, 2024
@efd6 efd6 self-assigned this Dec 12, 2024
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 12, 2024
Copy link
Contributor

mergify bot commented Dec 12, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 12, 2024
@efd6 efd6 force-pushed the s3_valueDecoder_offset branch from f456d3c to 849ff10 Compare December 12, 2024 19:52
@efd6 efd6 changed the title x-pack/filebeat/input/awss3: fix document ID construction when using … x-pack/filebeat/input/awss3: fix document ID construction when using https://github.com/elastic/beats/pull/42019 Dec 12, 2024
@efd6 efd6 changed the title x-pack/filebeat/input/awss3: fix document ID construction when using https://github.com/elastic/beats/pull/42019 x-pack/filebeat/input/awss3: fix document ID construction when using csv decoder Dec 12, 2024
@efd6 efd6 marked this pull request as ready for review December 13, 2024 05:28
@efd6 efd6 requested a review from a team as a code owner December 13, 2024 05:28
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@efd6 efd6 enabled auto-merge (squash) December 16, 2024 05:41
event.SetID(objectID(p.s3ObjHash, offset))
if offset >= 0 {
event.Fields.Put("log.offset", offset)
event.SetID(objectID(p.s3ObjHash, offset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we also skip setting the event ID? Is there something that uses this for idempotency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the stimulus for the change; in the case that there is no way to know the offset, we would end of making an @metadata._id for the document that is shared for all documents from the same object which would result in invalid document duplicates being handled by the index, and loss of data. By allowing a way to signal to the input that there is no way to differentiate between documents from the data source we allow downstream to know that it needs to fill the gap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. In my first pass I missed part of the context.

Copy link
Contributor

mergify bot commented Dec 16, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b s3_valueDecoder_offset upstream/s3_valueDecoder_offset
git merge upstream/main
git push upstream s3_valueDecoder_offset

@efd6 efd6 force-pushed the s3_valueDecoder_offset branch from 82b42a7 to de57f02 Compare December 16, 2024 19:50
event.SetID(objectID(p.s3ObjHash, offset))
if offset >= 0 {
event.Fields.Put("log.offset", offset)
event.SetID(objectID(p.s3ObjHash, offset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. In my first pass I missed part of the context.

@efd6 efd6 merged commit f310728 into elastic:main Dec 18, 2024
20 of 22 checks passed
mergify bot pushed a commit that referenced this pull request Dec 18, 2024
mergify bot pushed a commit that referenced this pull request Dec 18, 2024
mergify bot pushed a commit that referenced this pull request Dec 18, 2024
efd6 added a commit that referenced this pull request Dec 18, 2024
…csv decoder (#42019) (#42099)

(cherry picked from commit f310728)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
efd6 added a commit that referenced this pull request Dec 18, 2024
…csv decoder (#42019) (#42101)

(cherry picked from commit f310728)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
efd6 added a commit that referenced this pull request Jan 27, 2025
efd6 added a commit that referenced this pull request Jan 27, 2025
…csv decoder (#42019) (#42100)

(cherry picked from commit f310728)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify bugfix Filebeat Filebeat Team:Security-Service Integrations Security Service Integrations Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants