#1698 OpenSearch: Make waitAck cache configurable via cache specifica… by HyemIin · Pull Request #1700 · apache/stormcrawler · GitHub
Skip to content

#1698 OpenSearch: Make waitAck cache configurable via cache specifica…#1700

Merged
jnioche merged 9 commits into
apache:mainfrom
HyemIin:refactor/#1698-make-waitAck-cache-configurable
Oct 23, 2025
Merged

#1698 OpenSearch: Make waitAck cache configurable via cache specifica…#1700
jnioche merged 9 commits into
apache:mainfrom
HyemIin:refactor/#1698-make-waitAck-cache-configurable

Conversation

@HyemIin

@HyemIin HyemIin commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

Thank you for contributing to Apache StormCrawler.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes

  • Is there a issue associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with #XXXX where XXXX is the issue number you are trying to resolve?

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

  • Is the code properly formatted with mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?

For code changes

  • Have you ensured that the full suite of tests is executed via mvn clean verify?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file?

Note

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

…cification

This commit replaces the hardcoded waitAck Caffeine cache configuration
in StatusUpdaterBolt with a configurable cache specification string.
It introduces the parameter `opensearch.status.waitack.cache.spec`,
allowing users to control cache behavior (size, expiration) via config.
@HyemIin

HyemIin commented Oct 20, 2025

Copy link
Copy Markdown
Contributor Author

@rzo1 rzo1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left some comments.

Comment thread external/opensearch/opensearch-conf.yaml Outdated
package org.apache.stormcrawler.opensearch.bolt;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.Assert.*;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid star imports - also for the other occurences.

}

@Test
@Timeout(value = 2, unit = TimeUnit.MINUTES)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test removed? If it fails, we should adjust it (or provide a reasoning why it was dropped)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my bad.
I was trying to fix the Jacoco coverage issue and thought the test case was missing.
I’ll restore the original test case and include them in this commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Thanks for your help on the issue! :)

try {
bolt.prepare(conf, mockContext, mockCollector);
} catch (RuntimeException e) {
// 연결 실패 시 예외 분기 발생 → Jacoco branch coverage 확보

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop foreign language comments here please?

assertTrue(e.getMessage().contains("Can't connect"));
}

Field field = StatusUpdaterBolt.class.getDeclaredField("waitAck");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we trying to achieve here? Why do we check for the classname? If the class can't be initalized, the bolt is merely useless (so we should fail early), so no need for such a test imho.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was only intended to verify that the value defined in the YAML config was correctly applied.
It was temporarily added while debugging a Jacoco coverage issue, so I’ll remove it and include the fix in this commit.

@rzo1 rzo1 requested review from jnioche, mvolikas and sigee October 21, 2025 07:06
@rzo1 rzo1 added this to the 3.5.1 milestone Oct 21, 2025
- update expireAfterWrite duration
- avoid star imports
- restore previous test code
- remove Korean comments
- delete unclear test case
@HyemIin HyemIin requested a review from rzo1 October 21, 2025 12:32

@jnioche jnioche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done a bit differently

  1. load the value of topology.message.timeout.secs in statusupdaterbolt and use that as default.
  2. if opensearch.status.waitack.cache.spec is specified use it instead
  3. comment out the config in opensearch-conf.yaml and explain the logic

Bascially we'll use whatever is set as a message timeout unless specified otherwise

what do you think?

@HyemIin

HyemIin commented Oct 22, 2025

Copy link
Copy Markdown
Contributor Author

@jnioche

I agree that your suggested approach makes more sense.
I’ve updated and committed the code based on your comment — can you please check if it aligns with what you had in mind?

Also, I have one question: if "topology.message.timeout.secs" is not explicitly defined in "opensearch-conf.yaml", will calling it by key automatically get the default value (300s)?

@rzo1

rzo1 commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

Yes. You have typically a cascade of YAML files when submitting a topology via flux for example, so it doesn't need to be defined in the OS specific config.

@HyemIin HyemIin requested a review from jnioche October 22, 2025 12:49
Comment thread external/opensearch/opensearch-conf.yaml Outdated
@jnioche

jnioche commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

@HyemIin thanks for your work on this BTW
the action failed due to a problem with the code formatting
would you mind running

mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false
prior to committing changes? Thanks

@HyemIin HyemIin requested a review from jnioche October 22, 2025 13:22
@rzo1

rzo1 commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

@jnioche jnioche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @HyemIin

@jnioche jnioche merged commit 38fa1a0 into apache:main Oct 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants