ohioman Posted March 7, 2008 Posted March 7, 2008 I've looked for anything to make it easier to deal with this problem, but nothing seems to be around. I'm wondering if anyone knows of a contribution that deals with duplicate titles in the store database. Specifically... something controlled from the admin that would query the database for multiple entries of products and list the titles with the option of deleting all but the one you need or want. I tried going through my all products listings for doubles, but none were found. The interesting thing is that in my admin, in the catalog section, it indicates that there are 531 products (added up through all the subfolders). According to the admin front page (under statistics), there are 528. I believe the 528 is the true number and that there are duplicates of 3 products that have quite possibly been listed in a different category than the original listing. What a thing to track down!! Oh.. by the way.. i'm running MS2.2 Any help would be appreciated. Thanks, Iceman
tapuahk Posted March 7, 2008 Posted March 7, 2008 The interesting thing is that in my admin, in the catalog section, it indicates that there are 531 products (added up through all the subfolders). According to the admin front page (under statistics), there are 528. It is possible that three products are not under a category heading... Or another possibility, three records were removed without removing the index to the categories (done through SQL possibly) Just a thought... ~~~Tapuahk
ohioman Posted March 10, 2008 Author Posted March 10, 2008 Ok.. well.. that was fun. I have figured this problem out and I should start by saying that there was more than 1 thing wrong here. I had several products that were were in the products table that were not in the products_to_categories table and I unfortunately had a few duplicates in the products_to_categories as well. The true number of entries that were faulty was far from the 3 that the numbers seemed to suggest at the beginning of my post. In all, there were 15 different entries between the two tables that caused this problem. :blink: I did some homework on the net to be able to get sql queries to be able to check id numbers between these two tables for exceptions, duplicates and such. If someone in the future realizes that there are different numbers of records between the products table and the products_to_categories table, the codes below will make your life much easier. Here are the sql queries I used to fix my problems: SELECT products_id FROM products WHERE (products_id NOT IN (SELECT DISTINCT `products_id` FROM products_to_categories) ); SELECT products_id FROM products_to_categories WHERE (products_id NOT IN (SELECT DISTINCT `products_id` FROM products) ); These codes will find any records in one of the tables that isn't in the other to let you know what product_id(s) need(s) attention. Depending on which table has more records in it, you will use one or the other. If you're not sure which one.. try them both. You should check both ways until both of these queries return 0 results. Once you've done that, you're half way home. Another thing you want to do is check these tables for duplicate entries. This can be checked by using AS NumOccurrences in your statement... like this: SELECT products_id, COUNT(products_id) AS NumOccurrences FROM products_to_categories GROUP BY products_id HAVING ( COUNT(products_id) > 1 ); Of course, if it is the products table you want to check... change FROM products_to_categories to FROM products in this code. By the way, for those who wonder why you searched for a product that you know it was entered into the database and it doesn't show up anywhere? This page the place you need to start. That product you entered may not even show up in the main folder in the (catalog) admin section or outside anywhere else. If this is the case, plug these queries into phpMyAdmin or what ever you use and you'll know the product_id's to look at for problems. Each product should likely have a row in the products, products_description and product_to_categories tables at the very least. If not, you have inserting or deleting to do. Do remember, IF you delete a row in the products, products_description or products_to_categories tables, you should also delete the row for that product_id in the other two tables as well. Simply put, if for no other reason... good house keeping!! :rolleyes: Hope this helps!! Iceman P.S. Thanks for the help tapuahk... when you mentioned sql, that really got me thinking of doing all this!!!
tapuahk Posted March 11, 2008 Posted March 11, 2008 Thanks Iceman, Always appreciate the good news. In fact, I think you should create a mod to application_top to give automatic warnings for these type of errors. It would improve the user-friendly feeling so people who don't know how to do SQL queries can feel at ease. I can write the code...rough at best, if you would want. ;) Call it "Database error checking" mod Tapuahk SELECT products_id FROM products WHERE (products_id NOT IN (SELECT DISTINCT `products_id` FROM products_to_categories) ); SELECT products_id FROM products_to_categories WHERE (products_id NOT IN (SELECT DISTINCT `products_id` FROM products) ); SELECT products_id, COUNT(products_id) AS NumOccurrences FROM products_to_categories GROUP BY products_id HAVING ( COUNT(products_id) > 1 );
ohioman Posted March 24, 2008 Author Posted March 24, 2008 Hmmm... what would you think about a link in the admin panel to use as the configuration, catlog, modules...etc links are? Perhaps click on... Check Database, then options such as: Duplicates, Missing Records, Missing Categories...(for those products not assigned to a category accidentally) etc... things like that? Looking at my admin folder here... I see room for a database.php file :) OH.. wait, just thought of something that might be even better. One link that when clicked would would query the database check each product id for the bare essentials we would look for. Example: Each product id should be checked as present in both the products table and products_to categories table. Also in the same loop, that product id should be checked for duplicates. Once done, the output would display the product id number in question, maybe the product name and exactly what was wrong with it in sections by heading... such as Duplicates in products_to_categories, Products with no categories and Products_to_categories/no products table entry? Down the road, I'm thinking this type of thing could expand to include a choice for particular fields in records. For instance... to make sure that every product has a price and is not set to $0 by mistake? Things like that!!! I am rather sure that with a bit of work, such a thing is possible and would be very helpful as you say to those who do not know how to work with SQL Queries. If they can work thier mouse... they have it made. These results would then give time saving information as to what exactly needs to be done with each record -vs- "table flipping" or running each query separately. Even perhaps down the road there is room for other things like being able to edit the records from admin according to the findings in this scripting? Perhaps there are other ideas out there??? Iceman
tapuahk Posted March 28, 2008 Posted March 28, 2008 Hmmm... what would you think about a link in the admin panel to use as the configuration, catlog, modules...etc links are? Perhaps click on... Check Database, then options such as: Duplicates, Missing Records, Missing Categories...(for those products not assigned to a category accidentally) etc... things like that? Looking at my admin folder here... I see room for a database.php file :) OH.. wait, just thought of something that might be even better. One link that when clicked would would query the database check each product id for the bare essentials we would look for. Example: Each product id should be checked as present in both the products table and products_to categories table. Also in the same loop, that product id should be checked for duplicates. Once done, the output would display the product id number in question, maybe the product name and exactly what was wrong with it in sections by heading... such as Duplicates in products_to_categories, Products with no categories and Products_to_categories/no products table entry? Down the road, I'm thinking this type of thing could expand to include a choice for particular fields in records. For instance... to make sure that every product has a price and is not set to $0 by mistake? Things like that!!! I am rather sure that with a bit of work, such a thing is possible and would be very helpful as you say to those who do not know how to work with SQL Queries. If they can work thier mouse... they have it made. These results would then give time saving information as to what exactly needs to be done with each record -vs- "table flipping" or running each query separately. Even perhaps down the road there is room for other things like being able to edit the records from admin according to the findings in this scripting? Perhaps there are other ideas out there??? Iceman All great ideas! Just did a search on people finding duplicates in their database on the forums and wow... definitely a common problem. Okay, let's definitely make it an Admin error checking tool. Let's start simple and concentrate on checking for duplicates and the option to delete extra entries from admin. We'll need a function to query each product name and look for duplicates in database. We'll need it to pass any duplicates to a new table, so the can be reviewed. [This is the meat of the contrib] A php file to call function and list them in style similar to the catalog display. Another function called by button next to duplicate to remove from database. (I think the existing Delete one is fine for this purpose. We can add checks for missing or extras in the tables later. This is an open source project so we can post code here and move discussion to a contrib thread when we have the beta finished. ;)
ohioman Posted March 28, 2008 Author Posted March 28, 2008 Ok.. sounds good. Tool is exactly right... as after looking in the Admin options... it would be logical to put it under tools.. and perhaps make it the Database Manager. So... there will be the already existing Database Backup option and then the Database Manager? I'm working on some code for the function(s) needed for this today.. so we should have a start on this shortly :rolleyes: Iceman All great ideas! Just did a search on people finding duplicates in their database on the forums and wow... definitely a common problem. Okay, let's definitely make it an Admin error checking tool. Let's start simple and concentrate on checking for duplicates and the option to delete extra entries from admin. We'll need a function to query each product name and look for duplicates in database. We'll need it to pass any duplicates to a new table, so the can be reviewed. [This is the meat of the contrib] A php file to call function and list them in style similar to the catalog display. Another function called by button next to duplicate to remove from database. (I think the existing Delete one is fine for this purpose. We can add checks for missing or extras in the tables later. This is an open source project so we can post code here and move discussion to a contrib thread when we have the beta finished. ;)
tapuahk Posted March 28, 2008 Posted March 28, 2008 Check part needs put inside loop to check for all product ids that are not NULL. $duplicate_check_query = tep_db_query("select count(*) as total from " . TABLE_PRODUCTS_TO_CATEGORIES . " where products_id = '" . (int)$products_id . "' and categories_id = '" . (int)$new_parent_id . "'"); $duplicate_check = tep_db_fetch_array($duplicate_check_query); if ($duplicate_check['total'] > 1) tep_db_query("insert into " . TABLE_DUP_PRODUCTS . " (products_id, categories_id) values ('" . (int)$products_id . "', '" . (int)$current_category_id . "')"); This might not be exactly correct but it's a start anyway
tapuahk Posted March 29, 2008 Posted March 29, 2008 I think we could have it automatically remove duplicates: Something like the following: WARNING THIS IS UNTESTED CODE DO NOT RUN ON LIVE STORE Please use test database for error checking... $query = mysql_query("SELECT products_id, COUNT(products_id) AS NumOccurrences FROM products_to_categories GROUP BY products_id HAVING ( COUNT(products_id) > 1 );"); while ($row = mysql_fetch_array($query)) { $dup= $row[`product_id`]; mysql_query("DELETE FROM `products_to_categories` WHERE `product_id` = $dup"); } echo mysql_error(); ?> I think this might remove both duplicates though how could I overcome that? Any ideas?
satish Posted March 30, 2008 Posted March 30, 2008 My suggestion that this will happen if a prodcut is alloted to more then 1 category. So in p2c table just fire a query and chek for count more then 1 for any product ID. Satish Ask/Skype for Free osCommerce value addon/SEO suggestion tips for your site. Check My About US For who am I and what My company does.
ohioman Posted March 30, 2008 Author Posted March 30, 2008 That actually is the very simple way of getting it done I think. Some people however, in their stores, duplicate product ID numbers are ok... as there is a mod out there that makes it possible to add a product to multiple categories. In that case, they can just choose to not delete those entries and get rid of the product ID numbers assigned to the same category more than once (as was in my case somehow). Possibly even better, the query could check for the same product ID number AND the same category ID number to guard against this? I'll have a post shortly on some code shortly. Iceman My suggestion that this will happen if a prodcut is alloted to more then 1 category. So in p2c table just fire a query and chek for count more then 1 for any product ID. Satish
ohioman Posted April 7, 2008 Author Posted April 7, 2008 Ok.. here's a bit of an update. I have been working on this issue and have some encouraging results thus far. Half of the hard part seems to be accomplished already. I'll just throw some things out there and see what comes back. Since this will be first contribution I have worked on, I hope to get things going the right way the first time. What an idea eh? :) I have already integrated a page (database_check.php) into my admin (under tools) by adding lines of code to the following files: admin/includes/languages/english.php admin/includes/filenames.php admin/includes/boxes/tools.php That was actually pretty easy as things are concerned. :rolleyes: Now the meat of the project: I thought about this quite a bit and looked around the files in the admin so that this mod can act in a similar way to the rest of the admin to fit in alright. Also, I looked for a way to write this such that it could expand rather easily as new things came along for it to check. Soooo.. i thought that the ways that the admin/configure.php handles switches and cases were a pretty good for this. As it is turning out right now.. each check can be a new "case" and can be selected by the click of a button and the appropriate query is fired and results shown if any. At this point I have added 1 case to check for duplicate product names in the products description table. It will display the following for any matching record: Products ID Products Name Products Date Added Products Manufacturer Here is the code with the query I've come up with to do this: DO NOT ADD THIS CODE TO YOUR STORE. IT HAS NOT BEEN FULLY TESTED AND IS ONLY PARTIAL CODE TO A CONTRIBUTION IN PROGRESS... THANK YOU!!! if (tep_not_null($action)) { switch ($action) { case 'products': echo "<table cellpadding=0 border=1 valign=top>"; echo "<tr>"; $products_count_query = "SELECT p.products_id, p.products_date_added, p.manufacturers_id, pd.products_id, pd.products_name, m.manufacturers_id, m.manufacturers_name, COUNT(pd.products_name) AS NumOccurrences FROM " . TABLE_PRODUCTS . " p, " . TABLE_PRODUCTS_DESCRIPTION . " pd, " . TABLE_MANUFACTURERS . " m WHERE p.products_id = pd.products_id AND p.manufacturers_id = m.manufacturers_id GROUP BY p.products_id HAVING ( COUNT(pd.products_name) > 1 )"; $products_count_query_info = tep_db_query($products_count_query) or die(mysql_error()); $num = mysql_num_rows($products_count_query_info) +1; for ($i=1; $i < $num; $i++) { $row = mysql_fetch_array($products_count_query_info); echo "<tr>"; echo "<td>", $row['products_id'], "</td>"; echo "<td>", $row['products_name'], "</td>"; echo "<td>", $row['products_date_added'], "</td>"; echo "<td>", $row['manufacturers_name'], "</td>"; echo "</tr>";} } } So far, I've only tested this by switching the COUNT (pd.products_name) >1)"; to >0)"; When I did that every product that I have in my database showed up with the accompanying information requested for that record... so.. I would have to believe it's is correct for what is needed. The idea that I had in displaying these things is that the name, date added and manufacturer can help identify further which record that someone would want to delete. Such as, if you have one that was added months ago and one just 2 days ago as a duplicate... certainly you want to delete the new one that hasn't been picked up in the search engines yet??? :) Also, in some cases with us... there might be a product that has the same name as an other.. but is made by a different manufacturer with a different price even. In that case, with this it will be very easy to tell and those records can stay in tact. Neat huh? :) I have a sample default page at this moment with some text and a clickable button to enact the products case for duplicates. That seems to work like a charm. Now I am working on displaying buttons with every record that comes up so they can be deleted if necessary. Earlier in the thread, it was mentioned that perhaps sql would not just delete 1 record out of 2 identical records? Well.. all indications i have found say exactly that. Sql in that situation will delete both records and there you'll be!!! My thinking is that this will only happen in the situation with products_to_categories. All others should have content in that particular record that can make it unique (Ex: products_date_added). What's the likelyhood of THAT being the same for 2 records??? I have read about a different approach to this odd situation though: -Copy the table to a "temp" table using "unique" which will filter duplicates to the copied database -Empty the original table completely -Copy contents of "temp" table to original table I think that's quite a way to do it, but it is all I can find at this point. If anyone else has found something out there that's been missed.. it would be great to hear about it. Since cases can simply be added to this, I'd like to hear some different things to search for. The duplicates that I can think of at the moment are: -Product names in table products_description -Products_id in products_to_categories Got any more? I am thinking this might be a good time to add it??? It's still got a bit to go yet.. extra things to check for, titles to tables and things like that. However, I think by the time this is done.. it should be pretty cool. I've never really messed with switches and cases like this.. so.. I'm actually learning the hard way on this one. Should be an interesting ride. :) Sooo.. any of you that have some suggestions of any kind, they would be greatly appreciated. After all.. this will be for all of us to enjoy!!! Iceman :thumbsup:
tapuahk Posted April 10, 2008 Posted April 10, 2008 Sooo.. any of you that have some suggestions of any kind, they would be greatly appreciated. After all.. this will be for all of us to enjoy!!! Iceman :thumbsup: Suggestion: Make two options for every duplicate entry found: Edit: edit the item as you would in the catalog section of the admin Delete: add an are you sure? before actually firing the delete command to the database. Make selection from list similar to catalog section: ie clicking anywhere on a row as selects that item. Other things to test for: *Products with price of $0.00: Problem if you are trying to make a profit. *Products with no title: this is common problem for multi-language shops, which causes SEO mods to not work correctly. *Products without a category: NOT always a problem, but makes product difficult to find by users. *Categories with no products: NOT always a problem, could have only sub-categories. *Products with no weight that do not have download attribute assigned: Either a virtual product without a file associated or a real product missing its weight. *Products with 0 available: Not a problem, but the list can make updating or removing these easier than going the catalog route. I could add more if time permits. The above though should kick this mod into the I can't live without it category. Let me know how I can help, ~~~Tapuahk
ohioman Posted April 10, 2008 Author Posted April 10, 2008 Interesting you should mention 2 options. After my last post, I decided to make the product ID number clickable to go to categories.php to be able to edit that product on the fly. I figured.. why add more code when it already existed in that section of admin. Less work on my part.. and no duplication!! :) As for the products with no price.. yep.. i forgot to list that one here.. but.. it's on the list to be added. I've taken note of all the other suggestions as they would be great additions to this mod. Also... I have thought to work this as perhaps 2 different ways to be able to do this... all in the same mod. Admin could choose from the following: 1) Choose which query they want to run one by one. 2) A step by step approach ... one right after another with the option to "skip" a step or leave if so desired. It appears and acts very much as a tutorial would. The step by step is what is being written right now.. however.. it would not be difficult with that code in place to edit it for the other version of it too or instead. What do you think? I just thought that maybe it might be easier for a newbie to go through everything.. or it would be easy to make sure you don't miss one by mistake. Just a thought. Perhaps with either option is the best? Iceman :thumbsup: Suggestion: Make two options for every duplicate entry found: Edit: edit the item as you would in the catalog section of the admin Delete: add an are you sure? before actually firing the delete command to the database. Make selection from list similar to catalog section: ie clicking anywhere on a row as selects that item. Other things to test for: *Products with price of $0.00: Problem if you are trying to make a profit. *Products with no title: this is common problem for multi-language shops, which causes SEO mods to not work correctly. *Products without a category: NOT always a problem, but makes product difficult to find by users. *Categories with no products: NOT always a problem, could have only sub-categories. *Products with no weight that do not have download attribute assigned: Either a virtual product without a file associated or a real product missing its weight. *Products with 0 available: Not a problem, but the list can make updating or removing these easier than going the catalog route. I could add more if time permits. The above though should kick this mod into the I can't live without it category. Let me know how I can help, ~~~Tapuahk
tapuahk Posted April 12, 2008 Posted April 12, 2008 I think if you have them go step by step is great. However if you consider each query ("Check") to be a separate function then you could have a sub-menu to allow users to fire off just the single query. Sort like: Iceman's Error Checking Tool which runs all the queries. Products with duplicates Products with price of $0.00 Products with no title Products without a category Categories with no products Products with no weight that do not have download attribute assigned Products with 0 available So I think both options would be preferred.
ohioman Posted April 12, 2008 Author Posted April 12, 2008 As it is at the moment, they are listed as "cases" in the main file.. but the function idea is an interesting one. Since the code is already there, it would be very simple to have them into a function file which they can be called on separately as needed. This is exactly as the existing menu is right now for it. I guess that means that also I can just add a menu to the "default case" for which any of the queries can be chosen at will. Definately something that isn't difficult to do either. I like it. Iceman :thumbsup: I think if you have them go step by step is great. However if you consider each query ("Check") to be a separate function then you could have a sub-menu to allow users to fire off just the single query. Sort like: Iceman's Error Checking Tool which runs all the queries. Products with duplicates Products with price of $0.00 Products with no title Products without a category Categories with no products Products with no weight that do not have download attribute assigned Products with 0 available So I think both options would be preferred.
ohioman Posted April 28, 2008 Author Posted April 28, 2008 **UPDATE** Well.. well.. this certainly has been an interesting ride creating this mod. There has been many things that have come up since the time I have started. However, I am happy to say that what I consider to be the hard part is now over. I have 7 different checks working now. They are: 1) Duplicate Products 2) Products No Title 3) Products No Price 4) Products No Weight 5) Products No Category 6) Categories No Products 7) Duplicate Products To Categories And.. believe you me.. number 7 was the tricky one!!! Woof! I didn't go the route of copying tables or anything... as I really didn't like that option.. and sought out another alternative. I had the other steps done in no time flat! Here's one thing that's not so good.. it was mentioned before about the 0 weight thing that did not have download attributes. Well.. I have a confession to make.. my store does not deal with downloadable material at all. Even though I have looked those fields over, I still do not really understand the relationship of those to other things...what they mean ... let alone the relationship to themselves. Therefore, I am quite uncertain as to how to implement them into this mod. All 7 steps you see above do not take into consideration any downloadable products. Sorry! There are a few upshots to the situation though. The idea of having the options of either choosing what to check for.. or using the "step by step" method... works without a hitch! Additionally, you will be pleased to know that any records found in any of these checks show additional information about each record (if available). For instance, if it brings an Product ID Number... the products name will surely be there as well... Categories ID.. with the categories name too. I figured.. a bunch of numbers wouldn't really mean anything to people. So, to make it more friendly.. easier to use there is enough information about each record automatically there row by row. THEN.. product id numbers and categories id numbers are also clickable links. When a products_id is clicked.. it populates categories.php (in a separate window) with the appropriate information to be able to edit that product should you see fit. With categories id's... it also goes to categories.php and you can view the contents of that category should you want to do more investigation before you make deletes.. or anything else. I really wanted this to be able to integrate with atleast one other part of admin to make it feel a part of it.. and.. I believe I've done just that. All records listed have a delete button which asks... Do you really want to delete? (Ok) (Cancel) to help avoid mistakes. Buttons back to the main page and to the next step are available at all times. You are never locked in the step by step.. or anywhere for that matter. You can skip a step without making changes to the current step. Your original admin left_column is always at your disposal should you want to do something elsewhere in your admin. If everything is fine with your current step.. it will tell you.. No records found! Should you choose to delete a record.. once you do.. it will run the step again to give you updated results (after the delete). So.. in effect you can do this until you have no more results and go to the next step if you want to.... 1 by 1. I have tried to give as much flexibility to this and had some interesting challenges along the way to make this mod something that the most people out there could use. To give you one example: People who have their products assigned to multiple categories.. can use this mod with no problem that I can find. I had a problem getting "true matches" with step 7.. when products were assigned to more than 1 category.. but.. that problem I believe is 100% solved according to my tests. I have tried to assure this by making sure that in the count of numberoccurances.. that a true match is by both products_id AND categories_id. Then.. on the delete part of it... only limit the number of records to delete at 1. In this case: Products_id Categories_id 45 15 45 15 45 20 According to my tests.. you should be left with this: Products_id Categories_id 45 15 45 20 as it should be. Initially... this was a real problem. I'm sure that there's a few things I haven't mentioned.. but.. that's the idea of it anyways. There's nothing earth shattering about the code in this and perhaps someone out there can optimize things to make it .001 seconds faster or whatever they measure that stuff in. :rolleyes: I've never been a real buff in that speed stuff. I just know that I've been up to my forehead in mysql for sometime now and have come up with results that are exactly what I've been after. Now that the big stuff is over... I'm going to now clean it up a bit... throw a few more monkey wrenches at it.. and a few other small things. All that shouldn't take me too long.. and.. then maybe throw it out here and see if we can break it! >_< Hopefully, I've done this well enough that It will stand. Ohioman :thumbsup:
ohioman Posted May 22, 2008 Author Posted May 22, 2008 **UPDATE** Alrighty then... this is the last update I'll make for this thread as the Datbase Check Tool v1.0 is now a reality. This is the support thread which you can find the link to download it if you wish: http://www.oscommerce.com/forums/index.php?showtopic=302994 All questions or feedback concerning it should now be posted there. Ohioman B)
Recommended Posts
Archived
This topic is now archived and is closed to further replies.