Share » Forums » Developer » Optimizing fetches for high-volume,...

Optimizing fetches for high-volume, high-traffic site

Optimizing fetches for high-volume, high-traffic site

Thursday 17 June 2010 9:58:28 am - 8 replies

Modified on Thursday 17 June 2010 10:06:31 am by Marko Žmak

Author Message

Gaetano Giunta

Friday 18 June 2010 1:44:24 am

Your questions are pretty much self-answering... It shows you have investigated thoroughly the problem ;-)

- doing custom sql queries: very possible, will give you great speedups. Use ezdb api, write your own sql. Hire (or be) an sql guru to get best results. There is no "standard" "more efficient" api in ez because the most efficient query will be the one that only works in your db (you can hardcode class ids, knowledge of content tree structure and such)

- adding indexes: very much possible. It might even be a good idea in conjunction with the point above

- using solr: quite possible. By default the index IS updated on every object publication, unless you use delayedindexing; net effect: publishing of objects gets slower, as it takes into account now the indexing phase

- usage of load_data_map: try it out: enable echoing into debug of the generated sql with that param on and off, and then run the generated queries with tracing enabled to see the difference. The idea is: if it is disabled, more smaller queries are generated (to fetch data_map later on), if it is enabled, a single query with more joins is generated (all attributes are fetched in one pass)

Principal Consultant International Business
Member of the Community Project Board

Marko Žmak

Saturday 19 June 2010 3:50:08 pm

Thanks Gaetano, here are some more toughts/questions...

Custom sql queries: Actually I would like to avoid that, because using DB abstraction layer is more safe - I don't want to have to upgrade my custom queries when I upgrade to a newer version of eZ that brings some changes in the database tables structure.

eZ APIs: I would still like to know if there is an eZ API function that does this: get only the node id's of latest objects belonging to a class. Does anyone know it?

load_data_map: I have investigated further with the load_data_map parameter, and yes it doesn't make a join to the ezcontentobject_attribute table when is disabled. I'll give this a try first, and see how it goes.

eZ Find: Can eZ Find be used to fetch all the latest nodes, wihtout using the keyword parameter? Would using eZ Fnd in this way be faster or slower than doing a normal fetch? Anyone knows this one?

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Marko Žmak

Sunday 20 June 2010 8:53:08 am

After further investigation I came to an interesting point...

The SQL queries produced when fetching nodes always have joins on this tables:

  • ezcontentobject_tree
  • ezcontentobject
  • ezcontentclass
  • ezcontentobject_name

The problem is that this joins ALWAYS produce a temporary table on the disk, and that's because the tables in the join contain longtext fields (since mysql can't use memory storage for tables containing text fields). Specifically this ones:

ezcontentclass - filed "serialized_name_list"

ezcontentobject_tree - field "path_identification_string"

What happens as a result of this is that a temporary table gets created on disk on EVERY fetch made in eZ (list or tree fetch), no matter how high you set the mysql memory limits. Since doing fetches is the most used way of retrieving data in eZ, this can lead to some serious performance issues.

The first questions that comes in mind are:

  • Are this two fields really so important so that have to be fetched in this join? How often are this fields actually used and can they be fetched later?
  • Can some of this fields be converted to varchar?

This is a problem that eZ Team should deal with, and correct it in some future eZP versions. Here are some suggestions and ideas:

1) Convert the "serialized_name_list" to varchar

All the entries in my ezcontentclass table have this field shorter than 100 chars. Why it has to be a longtext field?

Also is this field really needed here? Can it's data be divided into some other table fields?

2) Move "path_identification_string" out of ezcontentobject_tree table

The first idea is to move this field out from ezcontentobject_tree table, for example in a new table ezcontentobject_tree_path. So that it doesn't get involved in this fetch but is fetched later when needed.

Another idea would be to convert path_identification_string to varchar. For example in my DB the longest occurrence of this field is 199. So could this conversion be done? What are the implications?

I think this issue deserves some serious consideration from the eZ development team, since it can bring some really big performance gains. So what are your opinions?

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Gaetano Giunta

Sunday 20 June 2010 12:37:35 pm

Very interesting findings indeed. Thanks for the time you spent analyzing this!

My thoughts (disclaimer: I do not work for the engineering division):

'serialized_name_list': this is a horrible hack, and it should really be undone.

Unfortunately, the recent addition of "class description" and "attribute description" in content classes has been done using the same strategy, eg. a single text field where the multiple-lang versions of the description is stored. Which means we now have even more text cols to get rid of. I filed a bug in the tracker about this already, with my idea: move all the 'translatable' fields in a separate table in the db, with one lang per row, instead of gluing them together in huge blobs. The advantage is that over time more translatable elements can be added without a big fuss.

'path_identification_string': this one is tougher, as it is deeper in the system. But since we have multilingual urls, I think we should drop this column altogether... If we cannot remove it, we surely cannot shrink it: we have many sites where the name of a single node (ie. a part of the the path_identification_string) is longer than 255 chars in its own...

Principal Consultant International Business
Member of the Community Project Board

Gaetano Giunta

Sunday 20 June 2010 12:39:51 pm

ps: since you tested the size of your data in those two cols and found it small, could you try to ALTER those cols to varchar and measure the speed/db load difference?

Principal Consultant International Business
Member of the Community Project Board

Marko Žmak

Wednesday 04 August 2010 12:39:00 am

Sorry for the late answer but I've been busy lately... Here are some results of my research

Removing longtext fields

I tried to alter the columns ezcontentobject_tree_path and serialized_name_list, and I have changed them to varchar(200). After that the temporary tables for this problematic joins is no longer created on disk, but is kept in memory, I have checked that by inspecting the mysql status variables before and after the query. So this is good news.

But unfortunately, this didn't give any improvement, the problematic query was not faster. Which is very strange, it seems like creating temporary table on the disk is faster than in memory.

One possible reason for this could be that while this temporary table is created in memory, the system allocates this memory in disk swap, so actually it ends up on the disk anyway. This is just one idea and I didn't have the time to inspect it further, if anyone has some other idea feel free to share it.

But still I encourage anyone to give this a try on a test database and send the results if it gets some improvement.

Use eZ Find for fetching

I gave this a try in a quick test, and the results are promising. It works nicely for some cases. The only "complicated" thing is to decide where to use it, and to map the parameters from a regular fetch to the ezfind fetch.

What does load_data_map parameter actually do?

I gave this a try and it didn't bring any improvement. Actually the performance graphs showed a little bit more load on the database server. So I gave up on this. But maybe for some other sites this could still produce some performance improvements. It probably depends from case to case.

Optimizing indexes

I found a way for a big performance speedup by adding some indexes. It involves sections and an "archiving" mechanism. I'm currently testing it and will report more when I'm done.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Luis Delgado

Saturday 09 April 2011 6:35:03 pm

Sorry for the late answer but I've been busy lately... Here are some results of my research

Removing longtext fields

I tried to alter the columns ezcontentobject_tree_path and serialized_name_list, and I have changed them to varchar(200). After that the temporary tables for this problematic joins is no longer created on disk, but is kept in memory, I have checked that by inspecting the mysql status variables before and after the query. So this is good news.

But unfortunately, this didn't give any improvement, the problematic query was not faster. Which is very strange, it seems like creating temporary table on the disk is faster than in memory.

Optimizing indexes

I found a way for a big performance speedup by adding some indexes. It involves sections and an "archiving" mechanism. I'm currently testing it and will report more when I'm done.

Marko,

Hi after reading the whole thread I have found myself in a similar issue. Unfortunately I am not as experienced as you are. I found the above issues also apply in my case, I've been tweaking my.cnf growing cache tables over a over. I now have around 4000...

As I am not an SQL expert I am trying to find out how to index join queries but Im lost here... Have you been able to Optimize indexes? I'm sure that this thread will be very helpful for a lot of people that are trying to make a stable / memory efficeint / low server load configuration

Cheers

GOOD 4 ALL

Marko Žmak

Monday 11 April 2011 12:30:04 am

Hi Luis.

In the end I solved my problem by using indexes like this:

  • indexed the ezcontentobject table on section_id field
  • created a section "Archive"
  • created an extension for archiving that changes the section of the objects (moves old objects into "Archive" section)
  • modified critical fetches so that only objects that are not in the "Archive" section are fetched

You can use this archiving extension, it's not yet documented, but you can contact me if you need help in setting it up.

Also, if I take a look at your database, the code of your site and the sql queries it produces, I would be able to help you more. Feel free to contact me about that too.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu