Search can't find Japanese

rickcogley · August 24, 2019, 1:29pm

On v.1.0.54 (build 1548), I added some Japanese to an entity title and to the text of a wiki, but, search does not find it. English works fine.

Does indexing take time?

If it’s a bug, you can use this for test:

日本語 nihongo, Japanese language

rickcogley · September 26, 2019, 9:10am

In v1.0.68 (build 2001), the same thing is happening FYI.

If I search for string XYZセンサー exactly (it’s the title of one of my test tasks), I get a hit. But I cannot just search on センサー, probably because there is no space before. The thing is, while Japanese does have space characters (in both half and full pitch unicode), you don’t separate parts of Japanese sentences with spaces. Search tokenization is different for CJK languages.

rickcogley · October 14, 2019, 3:22pm

In v1.0.68 (build 2352), it’s the same thing still

vadim · October 14, 2019, 7:20pm

Hi. Could you please check again? We have tried to reproduce but it works here.

rickcogley · October 14, 2019, 10:05pm

Hi @vadim, created a new task with Name:

XYZ設定

… via a Task Table.

Made settings in it, like assignee, category, state. Clicked the search icon in the sidebar and tried:

Search XYZ設定 - works
Search XYZ - works
Search 設定 - fails
Search *設定 - fails (thought I’d see if wildcard is enabled)

Does that help?

rickcogley · October 20, 2019, 12:12am

Still the same in v1.0.68 (build 2443). @vadim did you happen to try the tests I listed above?

Fwiw, search on Japanese does work in Targetprocess…

vadim · October 20, 2019, 1:04pm

The problem is that it works for me here. We need some time to investigate the problem. 47%20PM

rickcogley · October 20, 2019, 7:07pm

hi @vadim, thanks for trying.

Yes, that works, when you search the exact first thing in a task name.

Even if the task name is something like イメージ不可 or 業務連絡, where the Japanese followed by other Japanese, searching on the first part of those strings does work. That is, searching イメージ returns the イメージ不可 task, and searching 業務 returns the 業務連絡 task. And your example, where Japanese is followed by latin chars, also works as expected.

What doesn’t work is, if you search 不可 or 連絡, the second half of each of the above two examples, or any Japanese from the middle of the string. Nothing is returned. So I guess I can hypothesize, that any Japanese “mid, or end string” is not found, here assuming “string = task name”.

(Also, searching for some remembered text from inside the task, is also not found, no matter what language. Maybe by design?)

In my experience, for CJK languages the search engine has to be told “search also for CJK type languages” because the tokenizer part of the search module has to realize that indexing assuming spaces, won’t work.

Let me know if I can help, with any testing etc.

e.g. searching task number 69:

Edited: a little more precision in words

vadim · October 25, 2019, 9:12am

You are completely right with your suppose: At this time search indexes only tokens which are surrounded by spaces or other delimiters.

rickcogley · October 25, 2019, 9:33am

@vadim yes, Chinese, Japanese and Korean don’t use latin spaces, so search engines are tweaked to allow that kind of language to be indexed.

For example I’ve used https://fusejs.io/ on a few projects with Japanese and Chinese, and it works fine. You just have to set the option “tokenize” and it will find even a single Japanese character anywhere in the indexed text.

Topic		Replies	Views
Bug in Japanese input (also Chinese and Korean) Bugs & Issues	0	164	December 23, 2023
Accented characters and the search feature (ctrl+k) Bugs & Issues search	4	408	August 1, 2022
Search Entity in Chinese only get items match the beginning of the search keyword	3	147	March 12, 2024
"." character can not be in space name Bugs & Issues	0	17	July 19, 2024
Can't figure out if/how I can search for a non-text character Get Help	2	258	April 9, 2022

Search can't find Japanese

Related topics