Development Magento

Indexes are friends, not food

2015-01-30_1348

When I first got into Magento development, in my mind, there were two ways of getting data from the database.  You would either call Mage::getModel(‘catalog/product’)->load($id) or you would work with the collection.  If you wanted to get a filtered list of something you would use the ORM to get it.

But as I’ve gained more experience I realized that there was more to the puzzle.

One of the things I’ve consistently seen in production deployments is that a lot of devs don’t get beyond the 1:1 model/collection view.  Consider a scenario where you wanted to list all of the visible products in the same category as a product provided on the URL.  Your code might look something like the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public function testAction()
{	
	$product = Mage::getModel('catalog/product')->load(
		$this->getRequest()->getQuery('id')
	);
	/* @var $product Mage_Catalog_Model_Product */
 
	$category = $product->getCategoryCollection()->getFirstItem();
	$productCollection = $category->getProductCollection();
 
	foreach ($productCollection as $product) {
		if ($product->isVisibleInCatalog()) {
			echo $product->getSku() . '<br>';
		}
	}
}

I will admit that this is somewhat contrived, but you will see the pattern an awful lot in Magento deployments.  Though, truth be told, if this example were to be accurate with a lot of the code I see it should look more like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public function testAction()
{	
	$product = Mage::getModel('catalog/product')->load(
		$this->getRequest()->getQuery('id')
	);
	/* @var $product Mage_Catalog_Model_Product */
 
	$category = $product->getCategoryCollection()->getFirstItem();
	$productCollection = $category->getProductCollection();
 
	foreach ($productCollection as $product) {
                $product = Mage::getModel('catalog/product')->load($product->getEntityId()); //WHAT!!  yep.
		if ($product->isVisibleInCatalog()) {
			echo $product->getSku() . '<br>';
		}
	}
}

What is one way that we could do this better?

Use an index!  In particular catalog_category_product_index.  Following is its CREATE DDL (sans foreign keys and indexes).

CREATE TABLE `catalog_category_product_index` (
  `category_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Category ID',
  `product_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Product ID',
  `position` int(11) DEFAULT NULL COMMENT 'Position',
  `is_parent` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Is Parent',
  `store_id` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Store ID',
  `visibility` smallint(5) unsigned NOT NULL COMMENT 'Visibility'
);

It contains the calculated relations between products and categories.  But not just categories, sort order, visibility and a few other things.  We can use that knowledge and build out a mechanism that, while more complicated, will execute much less logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public function test2Action()
{
    $productCollection = Mage::getModel('catalog/product')->getCollection();
    /* @var $productCollection Mage_Catalog_Model_Resource_Product_Collection */
    $select = $productCollection->getSelect();
    $select->join(
        ['ccpi' => Mage::getSingleton('core/resource')->getTableName('catalog/category_product_index')],
        'ccpi.product_id = e.entity_id',
        null
    );
 
    $cond = $productCollection->getConnection()->quoteInto(
        'ccpi2.category_id = ccpi.category_id
        AND ccpi2.is_parent = 1
        AND ccpi2.product_id = ?
        AND ccpi2.visibility IN (2,4)',
        $this->getRequest()->getQuery('id') 
    );
    $select->join(
        ['ccpi2' => Mage::getSingleton('core/resource')->getTableName('catalog/category_product_index')],
        $cond,
        null
    );
 
    foreach ($productCollection as $product) {
        echo $product->getSku() . '<br>';
    }
}

Now you might be thinking “how in the world did he come up with that!?”  It’s actually quite simple.  I knew I needed the data from catalog_category_product_index.  So what I needed to figure out was the query that would get me there.  So I eventually came up with the query:

SELECT cpe.sku, ccpi.category_id FROM catalog_product_entity AS cpe
INNER JOIN catalog_category_product_index AS ccpi
 ON ccpi.product_id = cpe.entity_id
INNER JOIN catalog_category_product_index AS ccpi2 
 ON ccpi2.category_id = ccpi.category_id
 AND ccpi2.is_parent = 1
 AND ccpi2.product_id = 1001
 AND ccpi2.visibility IN (2,4);

From there I constructed the Zend_Db_Select statement incrementally, adding clauses until it looked kind of similar.

For this type of scenario in this size of dataset the benefit is not that much.  But you add complexity, dataset size and server load and the drawbacks of the previous method becomes noticeably worse.

Does this mean that you can build out your own indexes and provide the same level of benefit in your custom code with custom logic?  Yes!  But not from this blog post.  🙂

Leave a Reply

Your email address will not be published. Required fields are marked *