Improve robustness of WARC generation by jnioche · Pull Request #1342 · apache/stormcrawler · GitHub
Skip to content

Improve robustness of WARC generation#1342

Merged
jnioche merged 5 commits into
mainfrom
impovWARCrobustness
Oct 5, 2024
Merged

Improve robustness of WARC generation#1342
jnioche merged 5 commits into
mainfrom
impovWARCrobustness

Conversation

@jnioche

@jnioche jnioche commented Oct 5, 2024

Copy link
Copy Markdown
Contributor

I am seeing exceptions when generating WARC metadata

Caused by: java.lang.NullPointerException
	at org.apache.stormcrawler.warc.MetadataRecordFormat.format(MetadataRecordFormat.java:69) ~[stormjar.jar:?]

presumably we are getting those because the tuple has no metadata but a single such failure breaks the whole topology. This PR makes it more robust in that if an exception is thrown while formatting the WARCs, an error message is generated and the tuple is skipped.

Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
@jnioche jnioche added this to the 3.1.1 milestone Oct 5, 2024
Signed-off-by: Julien Nioche <julien@digitalpebble.com>
@jnioche jnioche merged commit cec4083 into main Oct 5, 2024
@jnioche jnioche deleted the impovWARCrobustness branch October 5, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants