OK, this is a debug session in progress, so don’t expect a nice solution at the end. We’re working on a project that does analysis of some public voter registration data. The DB is hosted on Amazon RDS and I’ve been perplexed by how poorly queries are performing there, despite the tables only have about 10 million rows. Simple queries are taking many minutes, which is orders of magnitude slower than my laptop.
Mark suggested running ‘VACUUM ANALYZE’, which I didn’t think would help because my understanding was that the autovacuum process in PostgreSQL would be taking care of that on a regular basis. These queries had been slow for days with no recent inserts or updates, so certainly autovacuum should have caught up to them by now. But, I tried it anyway and lo and behold:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Woah, that worked! Sure, it took 3+ hours to run ANALYZE, but wow. So, why isn’t autovacuum automatically doing this for us. (I mean it has the phrase ‘auto’ in its name!!!)
I’ve found this great article on autovacuum basics which led me to do this query:
1 2 3 4 5 6 7 8 9
So those 2 large tables haven’t been ANALYZEd in weeks, despite the fact that we import a 10 million row CSV once every week. This is the end of my debugging road, for now. Hopefully, I’ll figure out what’s going on.