API - /items bugs with include_vocabs and banned vocab_id

thailyn · October 15, 2018, 12:36am

I’ve been using the API recently to get information about items, and in general the /items endpoint is working well. I now want to get some information about related vocabs (in order to determine ilk (word/char/sent) and banned status), and I’ve encountered what look like some bugs.

First, when sending a GET request to the /items endpoint, if I have ‘include_vocabs’ set to ‘true’, I get a 400 error response. If I leave out that parameter or set it to ‘false’, the request works, but, of course, no vocabs are returned. The API documentation doesn’t imply there is anything special about this parameter, and says boolean parameters should have ‘true’ or ‘false’ values. Is this parameter not supported, or is it a bug? (I get the same error response when I tested setting ‘include_contained’ to ‘true’, so it might be a general issue with boolean parameters.)

Second, if I want to look up vocabs for an item, I would use the vocabIds for each item, and then search for them using the /vocabs endpoint. I did some tests banning and unbanning parts of items and then fetching the most recently-changed items, and it looks like the list of vocab ids is empty for item parts (rdng/defn/rune) that are banned. So if only some of the parts are banned, some items for a vocab will have an empty vocabIds list, while others will not. Is this a correct observation? Is it a bug? I’m not completely sure, but I believe I have seen other items that do not have any vocabIds but are not banned. I believe these are for “comp” items, but I am not sure this accounts for all situations. In any case, since there are multiple reasons why the vocabIds list could be empty, I can’t make any assumptions when I see an empty list. And this means I can’t reliably determine an item’s ilk and banned status with the information available.

BenJackson · October 15, 2018, 3:35am

I have successfully used include_vocabs like this:

params = {
    ...
    'include_vocabs' : 'true',
    'vocab_fields': 'id,writing,reading',
    ...
}

I am not sure why vocabIds is a list at all – maybe it’s a Japanese thing? But I have seen them empty and containing one item in Chinese.

I do know that if you ban all parts you will not get that vocab in any item (I did not check to see if you get a full set of items with empty vocabIds which would be interesting). Prior to using the API I had been using “ban all parts” as a way to say “this is so easy I never want to see it”. When I started using the API to extract lists of words I know that was inconvenient, so I ended up mass-unbanning everything and only banning most parts (usually leaving defn) so I could still fetch the list of words. That really messed up my reviews for about a week

thailyn · October 15, 2018, 4:35am

Thanks for the reply, Ben. I hadn’t tried playing around with vocab_fields, since the default was stated to be ‘all’. I tried setting it manually just now, though, and I’m still getting a 400 error response. The parameters I used were:

{'sort': 'changed', 'vocab_fields': 'id,writing,reading', 'limit': 100, 'include_vocabs': 'true', 'offset': 1539570772}

with the final URL being:

https://legacy.skritter.com/api/v0/items?sort=changed&vocab_fields=id%2Cwriting%2Creading&limit=100&include_vocabs=true&offset=1539570772

If it works for you, though, then I’m probably doing something wrong. I don’t see what it is yet, though.

(Also, I don’t know why vocabIds is a list, either; I’ve never seen it have more than one item for Japanese items.)

BenJackson · October 15, 2018, 4:50am

When I was getting started I had to pull out the text of the error 400 to get the explanation for the problem:

try:
    request = urllib2.Request(url + '?' + urllib.urlencode(params))
    request.add_header('AUTHORIZATION', credentials)
    response = urllib2.urlopen(request)
except urllib2.HTTPError as e:
    error_message = e.read()
    print error_message
    sys.exit(1)

thailyn · October 15, 2018, 5:57am

Interesting, I get the following information:

  try:
    response = urllib2.urlopen(request, timeout=60)
  except urllib2.HTTPError as exc:
    print('Received error code "%s" from the server: %s' % (
        exc.code, exc.read()), file=sys.stderr)

Received error code "400" from the server: {"message": "limit must be between 1 and 30", "statusCode": 400, "error": "InvalidRequest"}

If I change the limit to 30 (or if I don’t specify a limit), it appears to work. That is, the response is successful and I get a nice-looking handful of vocabs. I’ll review it further when I get a chance later. However, the API documentation doesn’t say anything about a 30-item limit when include_vocabs is true.

Side note: I was going to make another bug post about it, but it bit me again here. The JSON response for vocabs does not do any processing for custom definitions. I’ve made a bunch of custom definitions, and some include embedded newlines, and those come across unescaped in the response. That leads to the JSON library I’m using (the default python one) to think the content is malformed, and I need to preprocess it manually. I’m not an expert on JSON, but I’m assuming it would be better for this preprocessing to happen serverside, both to be error-free and potentially more secure (unsanitized inputs and all?).

BenJackson · October 15, 2018, 6:18am

Skritter’s own export functions make broken output when there are newlines in definitions.

The sanitization should be easy. Any newlines not in strings are optional, so you can just join everything onto one line. You could replace each newline with a dozen spaces or something if you want to be able to put them back in (the spaces would be ignored outside of strings, and you could treat multi-space as newline in any string).

system · November 14, 2018, 6:22am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.