Index non FarCry Content with FarCry Solr Pro plugin?


#1

Hello All,

I have at least one custom CF app that I would like to see indexed by the FarCry Solr Pro plugin, with the goal that these pages show up in the main sites search results of course.

I’ve had this simmering in the back of my head for a little while and I’m assuming I can either bring Solr to the app, or the app to Solr. Of those, the first semi reasonable idea to form was perhaps creating a content type in FarCry that mirrors important columns of our custom CF app, then the plugin would take it from there. Thinking through, this sounds very hackey. I don’t see any way to go the other direction though (e.g. point the Solr Pro plugin at the custom app).

Is already a way to do this? If not is the direction I’m going laughably silly?

I guess I could dive into Solr Directly, but I don’t want to go behind this convenient plugin if I can avoid it.


#2

We’ve done exactly what you are suggesting - creating a custom type and importing the data into FarCry (and thus the Solr Pro plugin). I’ve done this with several clients. In fact I updated the plugin a few years ago to help support this functionality. In my case, I was importing about 7.5 million records. Both FarCry and Solr had no problem dealing with the extra data (it was previously a problem indexing that much data until I made the necessary updates to the plugin).

What I advise is writing a simple import script to first import the data into FarCry, but do not enable index on save in the Solr content type. Once all of the data is indexed into Solr, then index it in solr in small batches (the lower the batch size, the faster it runs. The batch size depends on how much RAM you give Solr. You need to play with a couple test batch indexes to find a sweet-spot on that server). At this point you could run each batch manually, but I just cheat and instead write a simple script to loop over (n) amount of records (batch size), index them, then cflocate back to the same file with the next record count as a url variable (so I would know which record to query from next and also when to stop the loop). I’ve done this method many times and it works great.

As for speed (web project and searching): Both FarCry and Solr laugh at 7.5m records (the amount I was using) and had zero issues with the amount of records (no speed degradation).

Hope that helps,

Jeff Coughlin


#3

Thanks for the reply Jeff. Sounds like I’m on the right track and sounds straightforward with a little elbow grease.