Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Illegal mix of collations


heikobear

Recommended Posts

Posted

Hi

 

I am pretty desperate about an illegal mix of collation message which appears for some clients in my website.

 

The message is like this:

 

1267 - Illegal mix of collations (cp1251_general_ci,IMPLICIT) and (latin1_swedish_ci,COERCIBLE) for operation '='

select zone_id from zones where zone_country_id = '30' and (zone_name = 'São Paulo' or zone_code = 'São Paulo') limit 1

 

I already made collation adjustments in the database, but that didn't make any difference.

The first time this error appeared it was also related to zone_id but with another zone. By the way, it was not even the zone the client is registered.

 

The error occurs when a client checks out for payment http://www.euro-deko.com/checkout_shipping.php

 

Thanks in advance,

 

Heiko

Posted

You must still have a mixture of encodings/collations in your database AND/OR page display encoding AND/OR cut-and-paste text source. The error message shows both CP1251_general_ci and Latin1_swedish_ci in use. Did you cut and paste in text from Word (CP-1251)? It probably contains "Smart Quotes" characters that aren't valid in the database default Latin-1 (Swedish collation). Furthermore, the zone_name shows UTF-8 text 'ã' in use somewhere.

 

You've got a mess on your hands.

Posted

Hi Phil

 

thanks a lot of your response. I am actually not proficient in SQL, I passed the issue to my coder who changed the collations.

What do you mean by page display encoding and cut-and-past text source?

 

Is the problem related to different language packages?

Posted

"Encoding" is the binary representation of the "glyph" (displayed character). Almost every encoding out there is a superset of 7-bit ASCII, differing only in the "upper 128" characters (accented Latin letters in Latin-1, for instance). CP-1251 and Latin-1 are "single byte" encodings, and so limited to 256 characters. UTF-8 is "multibyte" and can handle at least 65536 characters, enough for all the languages in the world. "Collation" is the order used for sorting and comparing strings. English is simply A to Z, while other languages have accented characters with specific rules about how they sort. Any language with upper and lower case letters is going to have Case-Insensitive or Case Sensitive sorting modes.

 

You need to be consistent in the encoding (and collation) used for your database, your language support files, how your pages are displayed, and (to an extent) any text you're cutting and pasting from. Your pages (on the browser) are generally going to be either Latin-1/ISO-8859-1 encoding (osC 2.2 default) or UTF-8 encoding (osC 2.3 default). There is mention in the error message about CP-1251, which is used on Windows, so I presume that you're cutting and pasting from Word or Outlook. The problem with doing that is "Smart Quotes" (typographically "correct" quotation marks, among other things) will come along for the ride, and they're invalid characters in Latin-1. It appears that if your site is in UTF-8, it will usually correctly convert them to UTF-8 codes, but that depends on Microsoft getting a long sequence of clipboard-related operations correct each time.

 

If you have more than one language supported, they should all be the same encoding (e.g., ISO-8859-1 or UTF-8) declared in the "set_locale" near the top of the <language name>.php file. The database should be the same encoding, and the page should be displayed in that encoding. If different languages declare different encodings, you will have problems with a mismatch between the database and the page display for at least one language.

 

Different collations (sorting order) are a lesser problem, and apply to database operations. MySQL doesn't like sorting or comparing two strings when one is, say, "general" collation and the other is, say, "swedish". Different encodings are a more serious problem. Latin-1 and CP-1251 are fairly close, but CP-1251 has a few characters (Smart Quotes) that will cause problems on display. UTF-8 is quite different from either of them, beyond the basic ASCII characters they share in common.

Posted

Hi Phil

 

this sounds complex, but why this error only happens with a few users?

If things a mixed up, should the error occur all the time?

I tried it in several environments (different providers, different places (Brazil, Germany), different browsers, hardware) and never get this error.

Other also said this would be a bug in SQL and I should update SQL.

Posted

Are these users the only ones hitting some improperly entered data, such as the São Paulo zone?

 

It does seem odd that only a few customers would hit this, if it's a misdeclared database field, so I would concentrate on the source of data. Did you cut and paste in names and data from your PC? You need to account for why some data appears to be CP-1251 (Windows Western European) and some is UTF-8. It's possible that the CP-1251 is a red herring and it's actually seeing the UTF-8 being treated as single byte encoding, and thinking it's CP-1251.

 

Your site set up for Latin-1 (ISO-8859-1) -- your database should be too (latin1-swedish-ci is the MySQL default). If you cut and pasted in certain data from a UTF-8 source, that would be a problem.

Posted

Hi Phil

 

Do you mean pasting on the site files? I dont think so.

What also happens, the portuguese characters are not recognized by the site and also appear like ?. I understand this shouldn´t happen as the Portuguese package is installed.

 

I checked now all the char settings in the language php file as you decriped and it is the following:

 

// charset for web pages and emails

define('CHARSET', 'iso-8859-1');

 

@setlocale(LC_TIME, 'pt_BR.ISO_8859-1')

 

db collation utf8_general_ci

 

In another post I actually saw that one should use iso-8859-1 in the language files and utf8_general_ci in the DB.

Now there comes in pt_BR.ISO_8859-1.

 

I have no idea if this is correct or not.

Posted

ISO-8859-1 is Latin-1 (Western European). pt_BR is various language-sensitive settings for Brazilian Portuguese. Now, if your database is UTF-8 you would need have language files (both English and Portuguese) that are UTF-8. Or, everything Latin-1. Whoever claimed that you should have UTF-8 database and Latin-1 page encoding is an idiot. You can't mix them like that and expect them to consistently work correctly. A major problem you're having is that you have text coming in as Latin-1 (or even CP-1251 if you're cutting and pasting from Word or Outlook) and osC and MySQL are trying to deal with UTF-8 data in the database. It's a mess.

Posted

Hi Phil

 

Thanks a lot for your contributions. I actually was wrong and the database was in CP-1251.

I just changed every table one by one to utf8. Strangely, this changes do not show up in the general database view, only when I look in Structure.

 

Also, my provider told me to turn off add_default_chartset and I also turned off display_errors.

 

Now at least characters which where input in the site are displayed correctly. I didn't change anything in the portuguese language file. Both Setlocale and Charset should be the same, saying utf8?

 

I asked a client who received the error message to simulate again and I guess now it will work.

Posted

If both Setlocale and Charset are being specified, they should be consistent encoding -- both UTF-8, or both Latin-1, etc. And they should match what the database is using.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...