Search can't find Japanese

On v.1.0.54 (build 1548), I added some Japanese to an entity title and to the text of a wiki, but, search does not find it. English works fine.

Does indexing take time?

If it’s a bug, you can use this for test:

日本語 nihongo, Japanese language

In v1.0.68 (build 2001), the same thing is happening FYI.

If I search for string XYZセンサー exactly (it’s the title of one of my test tasks), I get a hit. But I cannot just search on センサー, probably because there is no space before. The thing is, while Japanese does have space characters (in both half and full pitch unicode), you don’t separate parts of Japanese sentences with spaces. Search tokenization is different for CJK languages.

In v1.0.68 (build 2352), it’s the same thing still

Hi. Could you please check again? We have tried to reproduce but it works here.

Hi @vadim, created a new task with Name:

XYZ設定

… via a Task Table.

Made settings in it, like assignee, category, state. Clicked the search icon in the sidebar and tried:

  • Search XYZ設定 - works
  • Search XYZ - works
  • Search 設定 - fails
  • Search *設定 - fails (thought I’d see if wildcard is enabled)

Does that help?

Still the same in v1.0.68 (build 2443). @vadim did you happen to try the tests I listed above?

Fwiw, search on Japanese does work in Targetprocess…

The problem is that it works for me here. We need some time to investigate the problem.47%20PM

hi @vadim, thanks for trying.

Yes, that works, when you search the exact first thing in a task name.

Even if the task name is something like イメージ不可 or 業務連絡, where the Japanese followed by other Japanese, searching on the first part of those strings does work. That is, searching イメージ returns the イメージ不可 task, and searching 業務 returns the 業務連絡 task. And your example, where Japanese is followed by latin chars, also works as expected.

What doesn’t work is, if you search 不可 or 連絡, the second half of each of the above two examples, or any Japanese from the middle of the string. Nothing is returned. So I guess I can hypothesize, that any Japanese “mid, or end string” is not found, here assuming “string = task name”.

(Also, searching for some remembered text from inside the task, is also not found, no matter what language. Maybe by design?)

In my experience, for CJK languages the search engine has to be told “search also for CJK type languages” because the tokenizer part of the search module has to realize that indexing assuming spaces, won’t work.

Let me know if I can help, with any testing etc.

e.g. searching task number 69:

Edited: a little more precision in words

You are completely right with your suppose: At this time search indexes only tokens which are surrounded by spaces or other delimiters.

1 Like

@vadim yes, Chinese, Japanese and Korean don’t use latin spaces, so search engines are tweaked to allow that kind of language to be indexed.

For example I’ve used https://fusejs.io/ on a few projects with Japanese and Chinese, and it works fine. You just have to set the option “tokenize” and it will find even a single Japanese character anywhere in the indexed text.