iframe-proxy

distracteddev · 2013-05-31T21:53:14Z

I was getting internal client errors because the Queue was being populated with port:80 queueItems but the protocol was listed as https. This caused a call to https.get({port:80}) which throws an OpenSSH error:

[Error: 140735148347776:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:766:

…h would end up in a OpenSSH error.

cgiffard · 2013-06-03T00:32:58Z

Stop overriding default port configurations

cgiffard · 2013-06-03T00:34:53Z

I'll push to npm soonish, but I'm trying to resolve an error/semantic problem (#39) before I do so. :)

distracteddev · 2013-06-03T19:26:50Z

No worries and thanks for the great crawler :) Autodiscovery saved me a ton of time.

One issue I'm still having however is when I try to discover 50+ links the state of the app ends up in a situation where

if (crawler._openRequests >= crawler.maxConcurrency)

always evaluates to true and the openRequests never timeout (even after setting the timeout option to something small like 5 or 10 seconds) and the app is just left spinning in cycles waiting on requests that never seem to end.

I've tried to fix it for hours to no avail. As such, I was thinking of porting the Queue to use Async's built in queue implementation to handle making and receiving requests. Any thoughts on this?

cgiffard · 2013-06-03T22:29:35Z

(BTW I totally forgot to tell you, but 0.2.7 is on npm now.:))

In regards to that issue, perhaps I need to be tighter with the timeouts. I'm pretty sure the problem does not relate to the queue, and I'd like it to retain a relatively general interface so that a queue can be implemented, for example, in Redis, and shared between multiple machines running the same crawl between then.

I'll have a look into whether there's a simple way to fix this problem for you - that's most definitely 100% a bug. :)

distracteddev · 2013-06-03T23:53:32Z

I looked into it more and I think the Queue is functioning properly, its just that some requests were taking 150+ seconds to respond. I also realized you aren't using the timeout anywhere so I added:

// Around Line 700 of crawler.js
clientRequest.setTimeout(crawler.timeout, function () {
  console.log('TIMEOUT REACHED', queueItem.url);
  clientRequest.abort();
})

and that fixed my problem right up once I set the timeout to 10 seconds.

cgiffard · 2013-06-04T01:47:39Z

Sure, the missing timer is the bug. :)

I've got a bit of stuff on right now but I'll see if I can get to this later today. Thanks!

cgiffard · 2013-06-04T01:49:42Z

Fix bug where https requests were being sent with port:80 option whic…

2186635

…h would end up in a OpenSSH error.

cgiffard added a commit that referenced this pull request Jun 3, 2013

Merge pull request #40 from distracteddev/master

845faca

Stop overriding default port configurations

cgiffard merged commit 845faca into simplecrawler:master Jun 3, 2013

SaltwaterC mentioned this pull request Jul 3, 2013

Strange SSL error when trying to get this URL SaltwaterC/http-request#19

Closed

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop overriding default port configurations#40

Stop overriding default port configurations#40
cgiffard merged 1 commit into
simplecrawler:masterfrom
distracteddev:master

distracteddev commented May 31, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

distracteddev commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

distracteddev commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 4, 2013

Uh oh!

cgiffard commented Jun 4, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

distracteddev commented May 31, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

distracteddev commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 3, 2013

Uh oh!

distracteddev commented Jun 3, 2013

Uh oh!

cgiffard commented Jun 4, 2013

Uh oh!

cgiffard commented Jun 4, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants