Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Encoding Issues - I'm confused


greasemonkey

Recommended Posts

Hi everyone, my head hurts... I'm struggling to understand the encoding issue I have - and the more I read the more confused I get... I'm hoping someone can help.

 

The issue is minor... relatively speaking. I am upgrading from 2.2ms2 to 2.3.3 and I'm in the final stages of testing. I plan to go live this Friday and this is one of the last problems I have to fix and I have been working at it for 2 days without any progress... only confusion.

 

My store is English only, however, I'm in Canada and have many French customers from Québec.

 

Everything seems to be displaying correctly on the customer and admin side... however, in testing I have noticed encoding issues in several places where é, as an example, is being converted to é;

 

1) in the text I have added to checkout_process email body at $email_order .= a dash "-" is being dropped completely. And within the text added to the paypal_express at $email_order .= the same dash is being converted to � (diamond with ?)

2) the customers name within the "To:" line (the customer delivery and billing names and address are fine)

3) when customer orders are passed TO PDF with my batch printer

4) when customer info is passed to PayPal Express

5) and a few other places... I presume all issues are related

 

I never had issues with my 2.2 store... (everything was ISO_8859-1)

 

In the 2.3.3 database (which I have upgrade from 2.2) all text seems fine...

 

Both language files in my 2.3.3 store are set to

define('CHARSET', 'utf-8');

 

And my pages are seen as; <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

 

My phpmyadmin Server connection collation is set to uft_general_ci

 

And all my tables (except for a few created by addon like SEO Header Tags) in the 2.3.3 database are set to latin1.

 

Am I suppose to update all the tables to utf8? If so I presume I can figure out an UPDATE TABLE script to do this...

 

Also I notice some files, when opened in notepad++ are encoded ANSI and others are UFT8 without BOM (I'm not sure what BOM is???)

 

I have tried to add the http://addons.oscommerce.com/info/7628.... and no change...

 

Should all files be the encoded UFT8 without BOM or ANSI? I presume there is no way to change this except to open each file with notepad++ and change it?

 

I also am confused line in my language files;

in catalogue/includes/language/English.php I have @setlocale(LC_TIME, 'en_US.ISO_8859-1');

In the admin I have setlocale(LC_TIME, 'en_US.ISO_8859-1');

 

Should these not be uft something? I also notice there is not @ symbol in the admin file...

 

Sorry for the long post... But, at least for me, its a very confusing problem... and I'm now chasing my tail....

 

Scott

Link to comment
Share on other sites

You will need to "convert" all your language files to UTF-8 , or you could alternatively set your charset to be ISO_8859-1 for the upgraded store too.

 

� (diamond with ?) just indicates a character format which is not "recognized" in UTF-8 , ie. if you use a text editor set to utf-8 and go in to edit that text and replace that symbol with the same/similar character typed on your keyboard and then save the file it should show just fine.

Link to comment
Share on other sites

@@greasemonkey

 

There are some interesting points here, point 8 sounds about like yours:

 

http://stackoverflow.com/questions/275411/php-output-showing-little-black-diamonds-with-a-question-mark

 

I agree with Nick, choose a format and make sure you files and database are saved in that format consistenly.

Matt

Link to comment
Share on other sites

@@toyicebear , thank you very much... I would presume it would best to stay in utf8 at this point...

 

I'm having a problem converting the file to utf? I have set the charset already... I presume that is not what you mean?

 

define('CHARSET', 'utf-8');

 

When I open the admin English file with notepad++ and click encoding I see it's ASCI.... So I click "convert" to UFT8 without BOM to match the catalogue file and then save & close the file... however, when I re-open the file it's back to ANSI????

 

Link to comment
Share on other sites

Ok, I got both English files converted to utf-8... but doesn't fix the issue...

 

Should I now look at converting the DB tables to uft-8? And if so, what will this do to the current customer data that are using French characters?

Link to comment
Share on other sites

Ok, that just caused more issues...

 

So, currently I have catalogue English as UTF8 without BOM and Admin English as ASCI... Why can I not change the admin English file to utf8 without BOM? It just reverts back to ASCI...

Link to comment
Share on other sites

@@toyicebear @@mattjt83 Just wanted to say thank you both... Nick and Matt. As suggested I have chosen ISO_8859-1... and have un-done all UTF.

 

Everything is all good...

 

It's funny, looking back, how I chased my tail through 7 different files trying to fix this and I ended up each time just making it worse.

 

I'm still confused as to why I could not change my admin English file encoding to UTF8 without BOM.

 

 

Link to comment
Share on other sites

osC 2.3 is really meant to run in UTF-8, so it's unfortunate that you chose to convert back to Latin-1. As long as you're careful to ensure that every single field in your database is Latin-1, all your language files are Latin-1, and the pages are output in Latin-1, it should work. However, don't be surprised to have trouble with future updates or add-ons that expect your store to be in UTF-8.

 

é, as an example, is being converted to é

The é was either newly entered in UTF-8, or the database field was converted to UTF-8, but the page is being displayed in Latin-1/Windows-1252. Windows-1252 is Latin-1 + "smart quotes" characters. Most likely you have a language file that still instructs the browser to display in Latin-1.

 

a dash "-" is being dropped completely ... the same dash is being converted to � (diamond with ?)

You edited the text in Word or Outlook and cut and pasted it into a file (NEVER do that), and it used a "smart quotes" en-dash or em-dash imported (via cut and paste) as a character seen as invalid by UTF-8. Trying to display this single byte x8- or x9- byte on a Latin-1 page will often result in the rest of the text being dropped, as it's interpreted as a control code. It sounds like the PayPal code was trying to interpret it as UTF-8, where it's again an invalid character (� glyph).

 

And all my tables (except for a few created by addon like SEO Header Tags) in the 2.3.3 database are set to latin1.

Whoa! If you're going to have UTF-8 page display, you need all fields converted to UTF-8. Period. Otherwise they're supplying single byte accented characters (like é) to a browser expecting UTF-8, and � results.

 

Also I notice some files, when opened in notepad++ are encoded ANSI and others are UFT8 without BOM (I'm not sure what BOM is???)

"ANSI" (properly, "ASCII") means any single-byte encoding, such as Latin-1. Any file that is pure English text will be ANSI/ASCII unless you have some non-ASCII text or punctuation mixed in. Pure ASCII text is perfectly compatible with UTF-8, as ASCII is the single-byte subset of UTF-8. If you edit text with pure English, "ANSI" and Latin-1 will be indistinguishable. Latin-1 extends ASCII by adding 128 various accented Latin characters and punctuation. UTF-8 adds multibyte sequences for anything beyond ASCII, including accented European characters (and any other writing system ever developed). UTF-8 defines a "Byte Order Mark" that can appear at the beginning of a file to let whoever is reading the file know that it's UTF-8 and the bit order used and the byte order used. Unfortunately, servers and browsers have not learned to throw away BOMs, so they can cause all sorts of problems when sent to a browser.

 

n catalogue/includes/language/English.php I have @setlocale(LC_TIME, 'en_US.ISO_8859-1');

In the admin I have setlocale(LC_TIME, 'en_US.ISO_8859-1');

Yes, those should have been changed from ISO_8859-1 to some form of UTF-8 (I'd have to look it up). That was probably overriding any other place that you set up to be UTF-8.

 

All in all, you only went about half way in converting to UTF-8. No wonder you had trouble! If you ever choose to try UTF-8 again, I hope the above notes will help you.

Link to comment
Share on other sites

@@MrPhil , thank you very much for such a detailed reply. I really appreciate it. Yes you are correct... It looks like I didn't go "all the way".

 

I did find a php file/script to update the database to utf on the zencart site which worked... However, the last thing left for me, I think (before I chose to move back to ISO), was the admin English file... Do you have any clue as to why I could not change the encoding on this file to UTF without BOM (it would change to UTF... but then caused a bunch of issues)?

 

In any case I'm going live in 2 days... so once I am up and running I will take another crack at it in my sandbox.

Link to comment
Share on other sites

An English language file probably doesn't have any accented characters in it, or specials such as a £ sign. If that's true, every single character in the file is ASCII. An editor will not be able to tell the difference between ASCII (ANSI) and UTF-8 for that file. They're interchangeable. Don't worry about it.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...