Nutch - loving it!
We've just put together a few search engines (indexes really) of various niches and the more we delve into it, the better it it.
The problems we had getting Java, Tomcat and Apache to work together were.... challenging. They really do seem to make it hard work - the Tomcat support site is utterly useless so it's down to creative web searching to find solutions.
It's been a while since I got hands on with Linux and it's amazing how old commands, shortcuts, etc come back to you. I was using vi like it was yesterday (it was 10 years since I last used it!). I think that probably says more about vi than my own skills.... There's something about using the command line that makes you feel more in control. Yes, it's far easier to make a total mess of a server but then it's far easier to get it to do your bidding.
What was also interesting was that there were more installation guides on how to install Tomcat on Windows than on Linux - strange huh.
What's also interesting is the very, very strange way in which Linux based software allows configuration. There's files all over the place and this whole cascading config model is just a pain. You have to check at least 2 places to ensure a setting will go through (IE the default and then the site/application specific config). Surely it's best to have over-rides in one config file?
We've been thinking of putting together a "build your own search engine" type platform so we had a look around at what's out there.
Rollyo looks pretty nice but it's based on Yahoo Search and let's face it, that is not a good thing. How Yahoo can still get it so wrong after all this time baffles me. Forget the results, they don't even crawl sites properly.
Google CSE is a pretty smart application that's been very well thought out and well implemented but I think there's room for a more in-depth search engine (by that I mean something that allows a deeper, more thorough crawl). Google crawl far more now than they used to but they still don't go far enough on some sites.
I'm currently working out how difficult it would be to put a front end on our Nutch installation to allow people to at least have a play with it. People can then see if it's worth the effort to get it installed, etc before they risk their sanity with web.xml not being where it should or missing Java libraries!
We'll be putting a few sites live in the next week or so and we'll post them on here for our regular viewers to take a look.
- admin's blog
- Login or register to post comments

Rss Feed