Pinyin search

What are the search rules regarding Pinyin without tone marks or numbers in both the new word entry search box (when adding a word to a category within a personal list) and the “My Words” search box?

  • When is a list of all words with identical characters but differing tones presenting?
  • Are single character vs. multi-character searches handled differently?
  • Is white space ignored?
  • How does adding some or all tone numbers/marks change this?
  • Do this two different searches behave differently?
  • What does it mean when I can find a new word to add by entering it in traditional, simplified, or fully accented Pinyin, but not via unaccented Pinyin?
  • Has anything affecting this behavior been changed recently?

As an example, adding a new word this morning, I can find 尷尬, 尴尬, gāngà, or gan1ga4, but not ganga

When you don’t specify tones in a pinyin search, it’s interpreted as “any tone.” So “youyu” would match you2yu2, you2yu4, you1yu4, you1yu2, etc. But when a search has tone marks, it restricts results to match the tones specified, and here not specifying a tone gets interpreted as “5th tone/轻声”. So “ni3hao” or “nihao3” won’t match anything (because they get interpreted as ni5hao3/ni3hao5), but “ni3hao3” will. Searches with different query input may return different results because different search terms were used. Searches with characters will try to match character writings, searches with pinyin will try to match readings. No changes to the vocab search have been made this year, though it is a feature we hope to improve and make more flexible in the future.

For “ganga” in particular, it has the problem on syllabic boundary ambiguity and without any additional context, gets greedily interpreted as “gang a” instead of “gan ga”. Leading and trailing space generally gets discarded and won’t affect search results, but searching for “gan ga” instead does give more context into what syllables you’re searching for and correctly returns 尴尬/尷尬.

1 Like