Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Problem with Hebrew. Help!


akalini

Recommended Posts

Posted

My Web-site is Multi-lingual. One of the languages is hebrew.

When I'm trying to edit the category name it allows me to write only 4 letters (?!) in Hebrew while in German Spanish and English I can write as many letters/words as I wish.

 

Would greatly appreciate any help. Thanks.

 

ASAF @ www.amarin.us

Posted

you need to edit the table in mysql and change the size of the field.

Please read this line: Do you want to find all the answers to your questions? click here. As for contribution database it's located here!

8 people out of 10 don't bother to read installation manuals. I can recommend: if you can't read the installation manual, don't bother to install any contribution yourself.

Before installing contribution or editing/updating/deleting any files, do the full backup, it will save to you & everyone here on the forum time to fix your issues.

Any issues with oscommerce, I am here to help you.

Posted

The categories_name field in the categories_description table should be 32 bytes (VARCHAR(32)), which is probably 16 UTF-8 Hebrew characters. Possibly one or two fewer, if RTL/LTR marks have to be added. You're getting only 4? You might look at the input field's HTML (browser View > Page source) and see if a very short length is given (say, 8 or 10 bytes). Are you using UTF-8? What browser? Maybe it doesn't handle bidirectional scripts very well -- can you type in Hebrew text without trouble elsewhere? What version of osC are you using? Maybe a very old version has short field lengths defined? Are you able to type in all the Hebrew text you want, but only 4 characters are saved?

Posted

Thanks a lot for your reply Phil! As for your questions:

 

I'm using UTF-8.

Browsers: both Google chrome or Explorer.

version of ocS is 2.2-MS2.

I type in the Hebrew text but it saves only 4 characters.

I have checked in the page source and found the following line:

 

<tr><td class="e">HTTP_ACCEPT_LANGUAGE </td><td class="v">he-IL,he;q=0.8,en-US;q=0.6,en;q=0.4 </td></tr

 

any clue? :blink:

 

Where and what do I have to change?

How do I get access to the SQL code (if needed)?

 

Many thanks again!

 

ASAF

Posted

<tr><td class="e">HTTP_ACCEPT_LANGUAGE </td><td class="v">he-IL,he;q=0.8,en-US;q=0.6,en;q=0.4 </td></tr

Is that something displayed on your store page? It's not a <meta> tag. It looks like something to do with a server request that's asking for a page preferably in Hebrew, then less preferably in US English, and then as a fallback "any" English. If the server can't understand this request, you get a "406" error. Is this from a Google results page or webmaster page or something? I don't ever recall seeing such information listed on an osC page, and in that format it's certainly for human eyes only. I think you're in the wrong page source.

 

Anyway, you say that you can type in more than 4 Hebrew characters, but the database is only storing 4? Have you confirmed that it's only storing 4, by going into phpMyAdmin and browsing the categories_name field in the categories_description table (assuming that's the right field in the right table)? If it's storing more than 4 Hebrew characters, but 8 or more Latin-script characters, we need to investigate what's going on. Are Hebrew characters displayed in the same width as Latin characters? I think they are, but if there was only room in an output display for 8 Latin characters, and Hebrew were double-width... I don't think there's any place that trims the category name down to 8 bytes (8 Latin-1 or 4 UTF-8 Hebrew characters). This is the full category name being displayed, not some condensed or abbreviated version in a summary or index?

Posted

Dear Mrphil, I have the same problem as well with Hebrew.

when I am trying to add an Hebrew translate to a category it let me write more then 4 characters but after i save it

it shows only 4.

 

here a sample of how it looks...

עברי

אוכל

 

after the 4 characters the system just put and that's all.

 

have any of you was able to find a solution for this problem ?

 

thanks in advance

Avi Kosta

Posted

I'm affraid I havn't found an answer yet.

 

I guess I have to change something in the MySql code so it will allow me to enter more than 4 characters/letters, but I'm not a programmer so I guess I'm stuck for now.

 

If someone could tell me where and what to change that would be wonderful.

 

Many thanks,

 

ASAF

Posted

Is this UTF-8 for both of you, rather than Latin-8? You're not overriding any field encodings in an otherwise UTF-8 database, are you? I just looked through my Unicode book, and it doesn't give any specific function for "shift-in" (U+000F) and "shift-out" (U+000E) codes in Hebrew (or Unicode in general). Do you know if your browser is inserting those codes, or are you doing it manually? Do you know what purpose they are supposed to serve? I don't see them mentioned in the osC code, so I'm assuming it's not osC entering them. There are separate "RLE" (right-to-left embedding, U+202B) and "LRE" (left-to-right embedding, U+202A), "RLO" (right-to-left override, U+202E) and "LRO" (left-to-right override, U+202D), and "RLM" (right-to-left mark, U+200F) and "LRM" (left-to-right mark, U+200E) control codes for mixing unidirectional and bidirectional scripts. Which would be appropriate for your case, I don't know.

 

If you are actually using a single byte code such as Latin-8, your browser may be using shift-in and shift-out to switch back and forth between single and double (multi) byte sequences. Have you tried this on other browsers? I find it interesting that your text got cut off right after the shift-out control character -- that may indicate a browser problem. What browser (and version) are you on? Can you type Latin characters (English text) after the shift-out, and they don't disappear?

 

Since I don't know or use Hebrew (or another bidirectional script, such as Arabic), I don't think I can offer any more advice than that. You're going to have to talk to someone familiar with the specific character set encoding you use, and the browser you're using, to see what's going on. It may also be something in MySQL that's going wrong (truncating a string after the shift-out, or adding the shifts upon retrieval). Sorry.

Posted

Is this UTF-8 for both of you, rather than Latin-8? You're not overriding any field encodings in an otherwise UTF-8 database, are you? I just looked through my Unicode book, and it doesn't give any specific function for "shift-in" (U+000F) and "shift-out" (U+000E) codes in Hebrew (or Unicode in general). Do you know if your browser is inserting those codes, or are you doing it manually? Do you know what purpose they are supposed to serve? I don't see them mentioned in the osC code, so I'm assuming it's not osC entering them. There are separate "RLE" (right-to-left embedding, U+202B) and "LRE" (left-to-right embedding, U+202A), "RLO" (right-to-left override, U+202E) and "LRO" (left-to-right override, U+202D), and "RLM" (right-to-left mark, U+200F) and "LRM" (left-to-right mark, U+200E) control codes for mixing unidirectional and bidirectional scripts. Which would be appropriate for your case, I don't know.

 

If you are actually using a single byte code such as Latin-8, your browser may be using shift-in and shift-out to switch back and forth between single and double (multi) byte sequences. Have you tried this on other browsers? I find it interesting that your text got cut off right after the shift-out control character -- that may indicate a browser problem. What browser (and version) are you on? Can you type Latin characters (English text) after the shift-out, and they don't disappear?

 

Since I don't know or use Hebrew (or another bidirectional script, such as Arabic), I don't think I can offer any more advice than that. You're going to have to talk to someone familiar with the specific character set encoding you use, and the browser you're using, to see what's going on. It may also be something in MySQL that's going wrong (truncating a string after the shift-out, or adding the shifts upon retrieval). Sorry.

Posted

Hi Phil, Many thanks again for your help.

 

Maybe we can try something simple.

 

could you copy and paste the following hebrew phrase (it means "Cosmetics") to your category name and save it?

 

קוסמטיקה

 

If your osC reacts like mine you may get only the first 4 characters of this word i.e "קוסמ"

(you may take a look at WWW.AMARIN.US in the hebrew page and see it for yourself...)

 

If it works ok then there must be something else that is wrong.

 

By the way, regarding the response of web-project above: "you need to edit the table in mysql and change the size of the field." How do i do that?

 

Thanks and regards!

Posted

Unfortunately, I cannot make the test because my site is currently Latin-1 encoding.

 

Like @travelmate81, are you getting any extra control codes there? He had shift-out and shift-in bytes around his text -- can you tell if you're getting them either during input, or coming back out of the database? You've confirmed that your database is completely UTF-8, with a "reasonable" collating sequence? Presumably your page is displayed in UTF-8, or you'd never see your Hebrew text.

 

Is it always 4 Hebrew letters, no matter what they are, or does where it gets cut off depend on what the specific letters are? If it seems to depend on the letter, it may be because the second UTF-8 byte of most Hebrew letters fall into the Latin-1 C1 control bytes. In your example, the sequence of bytes is (x = multiply sign, not Latin letter "ex") x<section sign> (QOF), x<Message Waiting> (VAV), x<inverted !> (SAMEKH), x<Privacy Message> (MEM), x<Start Of String> (TET), x<Undefined Control> (YOD), x<section sign> (QOF), and x<Cancel Character> (HE). They are displayed right to left. What I'm wondering is whether at some point in either the database or in the osC code, the sequence of bytes is being interpreted as single byte Latin-1 instead of using the multibyte (mb_) character routines. This might be code for escaping special characters in an input string, before putting them in the database. For example, the second byte of TET is "Start of String" control code, which might be doing something to the Latin-1 character string functions. You could try making a word with just 6 QOF letters, and see if it gets properly stored. QOF does not contain any Latin-1 control bytes. If all 6 Hebrew letters (6 QOFs) are properly stored, then the problem is in the PHP or MySQL code, where Latin-1 control bytes are not being treated as ordinary letters (not using the multibyte string functions). If it's still cut off at 4 letters, can you tell if there are Shift-Out and Shift-In codes around your 4 surviving letters, as in @travelmate81's example? They may be invisible codes. If it still cuts of at four letters, I'm out of ideas right now.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...