I got a strange request time out issue for my farcry 6.2 site, when it switched to connect to the production database server.
It was exactly the same code running on the same server but connecting to an internal SIT database server without any issue.
After the data source changed, now any page request will time out even if it only contains one line of cfoutput code.
It looked like a database issue to me initially so I created another test site (Non-farcry) with only one page request to the database I’m having problem with, and I can get the response back without issue.
From the debugger information I can see C:/wwwroot/mysite/farcry/core/tags/farcry/_farcryApplicationInit.cfm took long time to execute:
Total Time Avg Time Count Template
When you did your isolated test query, was it using the same DSN as your FarCry app, or a different one?
Did you check to see how long that individual query took? For example, was it < 10ms? When Core starts up it’s possible it needs to do several hundred queries so if you have any kind of unusual latency issues then over many hundred DB requests it could be grinding to a halt.
Is the new DB server on the same LAN segment as the previous DB server that was working? Is it somewhere further away?
I doubt it’s just a configuration setting on your app server, it does seem to be more like a networking issue, but it could also be very hard to diagnose without doing lots of different tests
Yes the test query from the non-farcry app was connecting to the same DSN
as the farcry site. The isolated test was not very fast either especially
when I requested it for the first time, but eventually in about 10 seconds
it did return back the result. Subsequent requests complete instantly.
The new DB server is in DMZ. What actually happened is that the CF server
was in a local corporate network. And it was connecting to a database
server also on that network. I have done all my testing. Everything worked
fine so the CF server got moved into the DMZ, which is the same DMZ where
the production database server sits. The standard 1433 port for sql server
has been opened so that I can create the DSN in CF admin.
I also feel it got sth to do with network but it’s a bit hard to explain to
network guys because from their point of view, as long as I can telnet to
that port they’ve done their work.
It’s a tricky one, my initial thoughts were latency but that should be easy to test by doing a ping.
If its not latency… Is there any kind of network device in between the app server and DB server that does packet inspection? Those types of appliances could have a performance impact.
If the app server and DB server are both in the DMZ on the same network segment then I would expect the network performance to be fine.
The other things that you could check are network interface card drivers (this does happen from time to time, even on virtual servers), make sure there are no strange port settings on switches, no links between the devices that are saturated / at capacity, or even test if things other than database connections are slow (e.g file transfers).
Do you have another app server which can access the database server? (Maybe one that can tunnel through the firewall?) Or vice versa, another database server that you can restore the database on and test from the app server? Perhaps you can narrow down whether the app server or the DB server has the problem, or whether the problem is in between them…
I know that seems like a lot of generic advice but its the best I can offer without seeing the environment
A couple of other easy things to check in the CF Administrator, make sure all debug settings are turned off as well as all the requesting monitoring stuff - the monitoring features are a production killer (at least in older versions of CF they were for sure).
I found there are many queries against farPermission table took about 5 seconds to retrieve just one record from the database. I can see there are lots of queries to farPermission table when the core starts up.
So I did a test from a sql client on the problematic CF server - running exactly the same query and it turned out to take 5 seconds to retrieve one record from the database.
Then on the old CF server (which I don’t have problem) I used the same sql client to run exactly the same query and it returned the result instantly - zero second.
The problem for the new server only happened after it got moved into DMZ.
I had a meetup with our network, database and server guys yesterday. We did
a bunch of testings together. We noticed that from the sql client on the
problematic server, even I ran a query to fetch 5000 records, it also takes
5 seconds, the same as just retrieve 1 record. And every subsequent query
also always take very consistent 5 seconds. So it’s not related to the data
size and they agreed it’s more a latency issue. So the network guy did some
tweaks. Sth related to reset some filters - I don’t know what exactly. Then
it fixed the problem of 5 seconds for the sql client. Now from the sql
client on both servers (old one without problem and new one with timeout
issue), I ran a query and I got pretty much the same response time.
But, when I ran my farcry site, I still got timeout. Then I switched my DSN
to the internal DB server. The site started successfully. Now, the weirdest
thing happened, I switched back the DSN to the DB server in DMZ, I did a
hard refresh and the site can be started but it is extremely slow. Just to
make sure I rebooted ColdFusion, then I got timeout again.
I’m thinking what would be the next best thing I can try.
This isn’t a problem in the application, these are normal queries that look up the permissions and then they get cached.
Your problem is that all queries are slow, so focusing on this specific query will not resolve the problem.
You said that queries in the SQL client were slow, and then your networks/server guys “changed something” and now the queries in the SQL client are fast. Can you please find out exactly what they changed? This could be critical to getting your app working.
When you set up your DSN in CF are you using the exact same settings as above in the SQL client?
Like I said before, you really need to find out what changes were made which fixed the performance of the SQL client. If you can find out what those changes are, then perhaps we can work out how to fix it