Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication for secure source is not working Norconex HttpCollector #203

Closed
shubhamsamy opened this issue Dec 22, 2015 · 6 comments
Closed

Comments

@shubhamsamy
Copy link

Hi,
I need to crawl a secure site. I am trying with the following configuration but it is not able to authenticate. I tried with 2.3 version but still not getting any clue as why it is not even looking for authorization. It is not even logging that whether it is going for authentication or not. Initially, I had issue with certificate, then i changed the config to add <trustAllSSLCertificates>true</trustAllSSLCertificates>
After adding it the error related to certificate has gone.

<crawlerDefaults>
 <httpClientFactory>      
      <authMethod>form</authMethod>
      <authUsernameField>username</authUsernameField>
    <authPasswordField>password</authPasswordField>
    <authUsername></authUsername>
    <authPassword></authPassword>
     <authURL></authURL>
</httpClientFactory>    
  </crawlerDefaults>
 <crawlers>
    <crawler id="Norconex eforge">
    <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
       <!-- <url></url>-->
       <urlsFile>E:\search\spiderURL.txt</urlsFile>
    </startURLs>
.......

The above is the extract from my configuration.
Please have a look and let me know as what could be issue.
Thanks!
Shubham

@essiembre
Copy link
Contributor

Please format any XML you paste on github using Markdown as otherwise your XML tags do not show. I edited your question this time to make sure we can see the XML.

I would need the actual URL to confirm if what you have is OK. Your full config would help as well to reproduce.

With what you have pasted, I can see you are missing the values for authUsername, authPassword, and authURL. Did you just remove them before pasting here and your real config has them? You need them.

Without pasting credentials, can you share the URL? You can always email the URL to me if too sensitive to paste here (you can get my email on my github profile).

@shubhamsamy
Copy link
Author

Thanks for your response.
Yes. I have removed the values for the fields but those values are there in my actual configuration.
URL which I need to crawl is an intranet site and is not accessible outside. Please find attached configuration which I am using at present.I have removed some information which are sensitive.
Thanks,
Shubham
config.txt

@essiembre
Copy link
Contributor

Do you get any error? Authentication mechanisms can vary greatly. The form-based authentication supported is fairly basic. If yours has custom elements in it or uses a more complex mechanism (e.g. SAML), chances are it is currently not supported. Can provide detailed information about the authentication technology being used?

Without that info or being able to reproduce, I am afraid I can't help much on this support channel.

One option is to write your own solution for it. A good place to start is by extending GenericHttpClientFactory#authenticateUsingForm

Another option is to contact Norconex for a private analysis of the authentication issue on your environment (via Remote Desktop or else).

essiembre added a commit that referenced this issue Jan 6, 2016
authentication issues. Maven dependency updates: Apache HttpClient
4.5.1. (Github #203)
@essiembre
Copy link
Contributor

I am closing due to lack of feedback and not being able to reproduce. If you have more information to share that could help troubleshoot, feel free to re-open or create a new ticket.

@reddyreddy16
Copy link

Is SAML based authentication supported in any release?

@essiembre
Copy link
Contributor

No it is not, but there is now a feature request for it: #421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants