LimitSubqueryOutputWalker walkSelectStatementWithoutRowNumber doesn't preserve order #6530

Open
opened 2026-01-22 15:34:35 +01:00 by admin · 0 comments
Owner

Originally created by @Ispolin08 on GitHub (Aug 30, 2020).

Hi here.

I am sorry I quite tired already, I spend one day of my life to find the reason why my ES doesn't index 7kk items but only 1.6kk. Thats why I will be quick in description.

Basically All I had to do its get products from my mysql with limit 60k and offset 3kk. To to it the paginator plans to perform

SELECT DISTINCT id_5 FROM (SELECT DISTINCT id_5 FROM (SELECT p0_.count_views AS count_views_0, p0_.vendor_code AS vendor_code_1, p0_.shop_code AS shop_code_2, p0_.model AS model_3, p0_.notes AS notes_4, p0_.id AS id_5, p0_.original_id AS original_id_6, p0_.name AS name_7, p0_.url AS url_8, p0_.description AS description_9, p0_.slug AS slug_10, p0_.created_at AS created_at_11, p0_.updated_at AS updated_at_12, p0_.import_version AS import_version_13, p0_.search_tags AS search_tags_14, p0_.rating_data_vote_count AS rating_data_vote_count_15, p0_.rating_data_vote_sum AS rating_data_vote_sum_16, p0_.rating_data_rating AS rating_data_rating_17, p1_.archived AS archived_18, p2_.enabled AS enabled_19 FROM product p0_ INNER JOIN product_sku p1_ ON p0_.id = p1_.product_id INNER JOIN product_category p2_ ON p0_.category_id = p2_.id WHERE p1_.archived = 0 AND p2_.enabled = 1) dctrn_result_inner ORDER BY id_5 DESC) dctrn_result LIMIT 100 OFFSET 300000

This is last place in the chain where I expect the problem. In case of small limit it works fine. Probably its only one reason why this bug was not founded early.

But as you can see the limit is for SELECT DISTINCT and SELECT DISTINCT is not managing the order. In the end this query with different offsets like 300000 and 300100 return SAME RESULT, or partially same.

Fortunately I solved my case just by disabling $fetchJoinCollection.

I discovered LimitSubqueryOutputWalker and its putting order from my base query to dctrn_result but int should ORDER dctrn_result itself.

Also I don't know why its here in case if I am not selecting anything from joined tables but let it be.

From other hand I've got a lot of new information about ES itself during I tried to find why the amount of docs is frozen. As you can understand now it was bcs this query returns same rows which was already in index and 75% of time its just rewrite it.
Thanks.

Originally created by @Ispolin08 on GitHub (Aug 30, 2020). Hi here. I am sorry I quite tired already, I spend one day of my life to find the reason why my ES doesn't index 7kk items but only 1.6kk. Thats why I will be quick in description. Basically All I had to do its get products from my mysql with limit 60k and offset 3kk. To to it the paginator plans to perform SELECT DISTINCT id_5 FROM (SELECT DISTINCT id_5 FROM (SELECT p0_.count_views AS count_views_0, p0_.vendor_code AS vendor_code_1, p0_.shop_code AS shop_code_2, p0_.model AS model_3, p0_.notes AS notes_4, p0_.id AS id_5, p0_.original_id AS original_id_6, p0_.name AS name_7, p0_.url AS url_8, p0_.description AS description_9, p0_.slug AS slug_10, p0_.created_at AS created_at_11, p0_.updated_at AS updated_at_12, p0_.import_version AS import_version_13, p0_.search_tags AS search_tags_14, p0_.rating_data_vote_count AS rating_data_vote_count_15, p0_.rating_data_vote_sum AS rating_data_vote_sum_16, p0_.rating_data_rating AS rating_data_rating_17, p1_.archived AS archived_18, p2_.enabled AS enabled_19 FROM product p0_ INNER JOIN product_sku p1_ ON p0_.id = p1_.product_id INNER JOIN product_category p2_ ON p0_.category_id = p2_.id WHERE p1_.archived = 0 AND p2_.enabled = 1) dctrn_result_inner ORDER BY id_5 DESC) dctrn_result LIMIT 100 OFFSET 300000 This is last place in the chain where I expect the problem. In case of small limit it works fine. Probably its only one reason why this bug was not founded early. But as you can see the limit is for SELECT DISTINCT and SELECT DISTINCT is not managing the order. In the end this query with different offsets like 300000 and 300100 return SAME RESULT, or partially same. Fortunately I solved my case just by disabling $fetchJoinCollection. I discovered LimitSubqueryOutputWalker and its putting order from my base query to dctrn_result but int should ORDER dctrn_result itself. Also I don't know why its here in case if I am not selecting anything from joined tables but let it be. From other hand I've got a lot of new information about ES itself during I tried to find why the amount of docs is frozen. As you can understand now it was bcs this query returns same rows which was already in index and 75% of time its just rewrite it. Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: doctrine/archived-orm#6530