Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Coded characters not being retained


sumdexusa

Recommended Posts

Whenever I edit a product, if the title of the product has a code for a trademark or some other coded symbol, upon previewing and publishing, the character or symbol turns into a square symbol. It is very irritating as any edit in a product means the code has to be retyped.

 

Is there any way to retain the code? Any help would be appreciated. Thank you!

 

EDIT: To be clear, adding the code for the first time or with an edit will properly display the symbol. But when you edit the product without re-typing the code, the title displays the correct symbol in the edit screen, but upon preview/submit of the changes, the symbol becomes a square.

Link to comment
Share on other sites

Thanks for the assistance KJ666. Unfortunately, just like all other codes, those aren't retained in the title either.

 

What is weird is that all the other text input boxes (specifications, features, description) retains all codes for these symbols. It is only the title text input box that, when the product is edited and not manually retyped, will turn the symbol that was properly being displayed into a square symbol (□).

 

Any other ideas?

Link to comment
Share on other sites

How do you create the trademark symbol in the first place? If it's cut-and-paste from a PC document, is the PC's character encoding different from what osCommerce's page encoding is (say, Windows-1251 vs Latin-1)? If the encodings are different, the trademark character on the PC (the particular byte code) will be invalid on your web site and you'll see some sort of "invalid character" mark. A "trademark symbol" isn't what's copied over in the PC clipboard -- it's a specific number dependent on the encoding used. The "paste" operation simply inserts that numeric code into the browser window, and if the encoding is different, either the wrong symbol shows up or the character is rejected as invalid. When you say that you "type it in again", is it entered in a different way than before? Perhaps using a browser "Insert character" menu function?

Link to comment
Share on other sites

Thanks for your assistance Mr. Phil.

 

I'm typing the code manually. Code I'm using is ™. The same code that is retained in the specifications, descriptions, and features boxes. But in the title box, when I edit an item that I have already typed that code into, it doesn't retain it. Here is the exact sequence of events I'm doing:

 

- I enter a new item with all the proper codes for symbols like . All displays correctly.

- If I need to edit the product, I select it and click EDIT. All codes in the text input boxes are properly retained EXCEPT the title text input box. In my situation, I often have to use the symbol in the title text box. The code is automatically substituted with the symbol which visually is OK. However, if I were to not edit the title, replacing the symbol with the actual code, when I click preview then update, the symbol becomes that square symbol.

- So, I have to replace the symbol with the ™ code in the title every time I edit a product.

 

I hope that helps. Thanks again.

Link to comment
Share on other sites

I have a few products with either the ™ or copyright symbol in the product name. I copy/pasted the title from my suppliers website. Example:

product name™ more text here

 

I never used any special code to get them to show properly on my site.

 

Tim

Link to comment
Share on other sites

I'm typing the code manually. Code I'm using is ™.

So you type in the 7 characters &-t-r-a-d-e-; into various places in the product name and description, and in just the title it gets converted only on edit to some "invalid character"? Does this "square symbol" have 4 hexadecimal digits inside it, or is it just a plain square? Could you check if you're using the same page character encoding (Latin-1, UTF-8, etc.) on all of 1) product data entry 2) product data edit and 3) shop page display?

 

Something entered as "™" should not be converted to a binary character code, but should be left as that HTML entity for the browser to render properly. I could see single ampersands "&" getting converted to & so that they don't confuse the browser, but if that was happening to ™ you should see literally "™" in the product display.

 

In addition to checking the character encoding (charset= in the <head>), maybe you could go to the character in question, whether it's showing as "TM" or as the "square symbol", and see what the HTML source to your page shows (View > Page source). If it's a binary character (not "™"), try different encodings (View > Character Encoding or something similar) for Latin-1, UTF-8, Windows-1252, etc. until you find one that it displays properly in. That might give a clue as to how it was converted to the wrong binary code.

Link to comment
Share on other sites

Here is what I have in the character encoding part of the page source:

 

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

 

And in the page source, I see that the gets converted to the square. Actually, the square has an "FF FD" inside it.

 

I did some experimenting and observations that allows me to further clarify the situation.

 

When I insert a symbol directly (cutting and pasting a symbol) into the title and the specifications text input box, they both turn into the "FF FD" square.

 

When I insert the ™ code into the title and specifications text input box, they both correctly turn into a symbol.

 

But when I edit that same product where I inserted the ™ codes, in the edit screen, the title text box has turned the ™ code into a symbol while the specifications text box retains the actual ™ code I typed. So, if I move ahead to preview/update the product, the results become consistent as the title turns the converted symbol into an "FF FD" square while the specifications box that retained the ™ code properly displays the symbol.

 

Do these observations point to some inconsistency in the coding between the title and the specifications text boxes? This seems to point away from an HTML charset problem of some sort.

 

Thanks again for your assistance. Much appreciated!

Link to comment
Share on other sites

BTW, does anyone know exactly where and what file holds the instructions on how to interpret the title, specifications, description, and features text boxes that you input in the admin? Maybe I can check there too.

 

Thanks!

Link to comment
Share on other sites

You've got both charset (encoding) declarations? How did you end up with that? You should have only one -- UTF-8 or ISO-8859-1 (Latin-1).

 

A square with FFFD in it indicates that your page is in UTF-8, but it is being fed an invalid character (some browsers display this as a white question mark inside a black diamond). A "TM" symbol cut and pasted from a PC document is likely to be in Latin-1 or (worse) MS Windows-125x encoding, which are invalid codes for UTF-8. Note that "TM" in Windows-125x is also invalid for Latin-1 (one of Microsoft's "smart quotes" blunders that use reserved character codes for characters that don't appear in Latin-1).

 

I just wrote a little HTML form to test this, and indeed a ™ (or 8482;) displays in a form text field as "TM", rather than the original form. Perhaps there is a bug in osC where data being sent to a text field is not being converted to &name;, but allowed to be rendered as "TM". Someone will have to look to see whether text with an HTML entity (™) is treated differently in different editor fields (htmlspecialchars() is applied in some to convert & to &, < to <, and > to >). You certainly don't want text that appears as "™" in the input field to come back as "&trade;".

 

Anyway, I would clear up the conflicting encoding settings, so that all pages are singing from the same page. That might not prevent ™ from being turned into a binary character code, but at least it shouldn't come back as the wrong byte code (Latin-1 or Windows-125x) on a UTF-8 page. You can always erase the "TM" and manually type in "™" if you want to, but it shouldn't then cause any problems with encoding conflicts.

Link to comment
Share on other sites

Anyway, I would clear up the conflicting encoding settings, so that all pages are singing from the same page. That might not prevent ™ from being turned into a binary character code, but at least it shouldn't come back as the wrong byte code (Latin-1 or Windows-125x) on a UTF-8 page. You can always erase the "TM" and manually type in "™" if you want to, but it shouldn't then cause any problems with encoding conflicts.

 

Thanks for the ideas. Seems like most of this is beyond me but I will at least try and fix the confliciting encoding settings.

 

Any hint on where or what file I can edit to delete one of those? And which one do you think I should go with? Pros and cons? Much appreciated.

Link to comment
Share on other sites

Hmm, that's odd. The first place I poked around in, which is catalog/index.php has this for the encoding:

 

<head>
<meta http-equiv="Content-Type" content="text/html; charset=<?php echo CHARSET; ?>">
<title><?php echo TITLE; ?></title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

 

What is the first one echoing? How does echo work, anyway?

Link to comment
Share on other sites

Do you have any add-ons that fool with character encodings (charset)? You'll just have to look around your site to see where "UTF-8" and "ISO-8859-1" show up, and trace back to see how one or both get into a page. You might start with includes/languages/english.php and admin/includes/languages/english.php (plus any other languages you support).

 

If your site will be for an exclusively US customer base (English and perhaps Spanish or French speaking), Latin-1 (ISO-8859-1) would probably be fine. If you are looking to attract customers from around the world, and might want to offer more languages, consider UTF-8. More importantly, what encoding is your database currently using, and what special symbols (other than TM) are currently in there? The fly in the ointment is that you use the "TM" symbol, which any encoding can support as ™, but if it's converted to a binary character code at some point, you have to ensure consistency in encodings. Note that UTF-8 supports the TM symbol as a binary code, while Latin-1 does not. Windows-125x encodings support TM, but you will have to change your page encodings to Windows-125x if there's any chance that binary "TM" marks will be stored in the database. A database that is Latin-1 should be able to store a Windows-125x encoded "TM" without any problem, but a UTF-8 database may complain about it.

 

I think there needs to be a discussion in osC about under what conditions an HTML entity such as "™" gets converted and stored in the database as a binary character code (1 or 2 bytes). Is this desirable behavior? I would personally prefer to see an entity such as ™ preserved as those 7 ASCII characters through any editing, not being changed to a TM symbol until the final rendering on the customer's browser.

Link to comment
Share on other sites

Hmm, that's odd. The first place I poked around in, which is catalog/index.php has this for the encoding:

 

<head>
<meta http-equiv="Content-Type" content="text/html; charset=<?php echo CHARSET; ?>">
<title><?php echo TITLE; ?></title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

 

What is the first one echoing? How does echo work, anyway?

Someone modified your index.php to hard-code a Latin-1 character encoding (ISO-8859-1), without bothering to remove the already output CHARSET (defined in english.php to be 'UTF-8'?). "echo" is just "print to output this PHP value". CHARSET is a defined name (a.k.a. "macro name") that gets substituted. Give whoever was fooling around in the code a severe wedgie -- they didn't have any idea what they were doing!

Link to comment
Share on other sites

Yeah, those guys definitely need a major wedgie. Especially since now I have to deal with this!

 

Anyway, here is what I have at the beginning of my includes/languages/english.php file:

 

<?php


// look in your $PATH_LOCALE/locale directory for available locales
// or type locale -a on the server.
// Examples:
// on RedHat try 'en_US'
// on FreeBSD try 'en_US.ISO_8859-1'
// on Windows try 'en', or 'English'
@setlocale(LC_TIME, 'en_US.ISO_8859-1');

define('DATE_FORMAT_SHORT', '%m/%d/%Y');  // this is used for strftime()
define('DATE_FORMAT_LONG', '%A %d %B, %Y'); // this is used for strftime()
define('DATE_FORMAT', 'm/d/Y'); // this is used for date()
define('DATE_TIME_FORMAT', DATE_FORMAT_SHORT . ' %H:%M:%S');

////
// Return date in raw format
// $date should be in format mm/dd/yyyy
// raw date is in format YYYYMMDD, or DDMMYYYY
function tep_date_raw($date, $reverse = false) {
 if ($reverse) {
   return substr($date, 3, 2) . substr($date, 0, 2) . substr($date, 6, 4);
 } else {
   return substr($date, 6, 4) . substr($date, 0, 2) . substr($date, 3, 2);
 }
}

// if USE_DEFAULT_LANGUAGE_CURRENCY is true, use the following currency, instead of the applications default currency (used when changing language)
define('LANGUAGE_CURRENCY', 'USD');

// Global entries for the <html> tag
define('HTML_PARAMS','dir="LTR" lang="en"');

// charset for web pages and emails
define('CHARSET', 'iso-8859-1');

 

 

Here is what I have in my admin/includes/languages/english.php file:

 

setlocale(LC_TIME, 'en_US.ISO_8859-1');
define('DATE_FORMAT_SHORT', '%m/%d/%Y');  // this is used for strftime()
define('DATE_FORMAT_LONG', '%A %d %B, %Y'); // this is used for strftime()
define('DATE_FORMAT', 'm/d/Y'); // this is used for date()
define('PHP_DATE_TIME_FORMAT', 'm/d/Y H:i:s'); // this is used for date()
define('DATE_TIME_FORMAT', DATE_FORMAT_SHORT . ' %H:%M:%S');

////
// Return date in raw format
// $date should be in format mm/dd/yyyy
// raw date is in format YYYYMMDD, or DDMMYYYY
function tep_date_raw($date, $reverse = false) {
 if ($reverse) {
   return substr($date, 3, 2) . substr($date, 0, 2) . substr($date, 6, 4);
 } else {
   return substr($date, 6, 4) . substr($date, 0, 2) . substr($date, 3, 2);
 }
}

// Global entries for the <html> tag
define('HTML_PARAMS','dir="ltr" lang="en"');

// charset for web pages and emails
define('CHARSET', 'iso-8859-1');

 

I see where I should change the code. Where can I check for database compatibility?

 

Sorry for all the questions. If I haven't said it enough already, I really do appreciate the assistance.

Link to comment
Share on other sites

You have

// charset for web pages and emails
define('CHARSET', 'iso-8859-1');

but at some point is it redefined to be "utf-8"?

 

In your early posting, where there were two "charsets", did you remove the <title> tag between the lines? If not, those lines you gave are not from the PHP code you gave. I'm wondering if there is code somewhere else, where both the utf-8 and iso-8859-1 lines are hardcoded, or what. If not, Some place redefines CHARSET from 'iso-8859-1' to 'utf-8', or else the code you give is not what is run (it shows CHARSET as being set to iso-8859-1, despite the HTML you showed with CHARSET of utf-8). I think you're going to have to search for where CHARSET is defined and find out where it is set to iso-8859-1 and where utf-8.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...