Questions about Vocab and Item IDs

laughedelic · February 28, 2017, 12:54am

Hi!
I’m playing with the API and I have some questions about IDs. I noticed that

Vocab IDs are like zh-马-0 for simplified and zh-马-1 for traditional (with writing: 馬)
Item IDs are like <user_id>-<vocab_id>-<part>, e.g. 415035565-zh-马-0-rune

So I wonder if I can rely on these observations. It could simplify things for me and save me some calls to the server. I understand, of course, that these assumptions may be generally wrong or things may change in the future.

josh · March 1, 2017, 6:15am

Yes, you can imply the associated vocabId from the itemId of a rune item. Skritter was initially built around simplified Chinese, then modified to accommodate traditional and finally Japanese. That led to some less than ideal formats for generating meaningful keys. Here is an example of a more complex traditional mapping example:

(里) zh-里-0 / user1-zh-里-0-rune
(裡) zh-里-1 / user1-zh-里-1-rune
(裏) zh-里-2 / user1-zh-里-2-rune

laughedelic · March 2, 2017, 2:11pm

OK, thanks. But I have a couple of questions: you say

associated vocabId from the itemId of a rune item

Is it only about rune? what about other parts, aren’t they consistent with this scheme too?

I also wanted to ask what does 2 suffix mean in the ID (I thought there are only 0 for simplified and 1 for traditional). Then I tried to query these three 里 vocabs and got quite surprised: -0 is simplified -2 is traditional and -1 is null Or is it affected by my account settings?

Update: If I search (query with the q parameter) any of these writings (separately): 里, 裡, 裏, I get 4 results:

zh-里-0 simplified with writing 里
zh-里-2 traditional with writing 里
zh-里-3 traditional with writing 裡
zh-里-4 traditional with writing 裏

josh · March 10, 2017, 2:24am

Whoops, look like I missed this reply. Yes, it’s only the rune that follows that scheme. We have an old mapping system (created back around 2007, so it might be a bit weird) that generates this stuff. It works something like this:

0 - 1 - 2 - 3
"里": “里裡裏”

laughedelic · March 10, 2017, 1:23pm

Sorry, @josh, but I think you missed my concern: the IDs for this example are 0, 2, 3 and 4. There is no 1.

Anyway, I see that by having just a random character there is no reliable way to generate an ID for it. It would only work for simplified characters (when you know that it’s simplified, just add zh- and -0).