Hi! I have some new questions about the Beta API:
Is the API documentation available in any easy-to-parse format, like JSON schema or anything else? (so that I can make a quick wrapper for the API, instead of copy-pasting from the webpage)
vocabupdates endpoint documentation mentiones the
fields parameter (for
Vocabs!), but it doesn’t seem to work (neither for
Vocab fields, nor for
It wasn’t very obvious for me that to add a new word for a user (like
Quick Add on the legacy website), I need to modify some list. I thought instead of using
Vocabs (which are neutral common entities), I have to do something with
Items and therefore I spent some time trying to use
POST on the items endpoint (it says “Creates new Items”).
So I just suggest to clarify this a bit somewhere in the docs. And to explain in general what is the conceptual difference between
Related question: is there any simple way to
Quick Add a word? Or otherwise, how does it work exactly? I suppose that it takes the latest custom user list and chooses some section in it. But it takes several requests to do, so I thought that there might be a simpler way.
From responses I see that if a
Vocab has an
audio attribute, it also has
audioURL, which is the same and a more interesting
audios, which is an array of objects (shown below). This is undocumented and I before asking any questions about it, I wonder if it’s something that is going to stay in the API.
- Authorization page mentions an option to get an “anonymous” access token. Does it mean that you can do basically everything, except adding new items to the queue?
Get a token with no specific user authorization. The API calls you make and data you receive is of course limited to data that’s accessible to all Skritter users, but depending on your needs this may be the simplest solution.
- Is there any simple way to get all
Vocab IDs that a user has learned? As I understand, requests to
vocabs endpoint are influenced by the user settings, but there is no reference to whether a user has this word in a queue, or has ever learned it. So probably I should call
ids_only parameter and then extract
Vocab IDs from them (still have questions about it). Is this correct?
Thanks for making this API publicly available!
And I understand that it’s still in beta and may change any moment, so some of these questions probably don’t have a proper answer…
Thanks for answers @josh!
- OK, although I think currently Quick Add on the legacy website adds it to the most recently used user-list. I don’t know whether it chooses the last section or the one user currently adds words from (
currentSection attribute), but I can try and figure that out.
- The problem with
audioURL is that it offer only one pronunciation, while a syllable may have multiple readings (still may have no audios for each reading). So if it’s fine, I’m going to rely on
audios property. Also I just discovered that there is an undocumented endpoint
http://beta.skritter.com/api/v0/audio, which I can query by
reading. Is it ok to use it?
I don’t think the audios property will every contain individual syllables for multiple character words, but rather all the recording we have for that word. If you want to get the individual recordings you’ll need to fetch the individual character vocabs (or use the audio thing you discovered below). I think there might be a hidden parameter called
include_contained that does this automatically.
api/v0/audio endpoint was originally designed for our internal usage with out client for recording more audio, but you can use it for querying by reading.
I know this I meant a different thing though: some characters (and probably words?) have multiple readings, for example,
reading: "hui4, kuai4" and
audioURL refers to the file for
audios has both pronunciations:
"reading": "hui4, kuai4",
An unrelated general question about API: have you considered providing a GraphQL interface for the database?
I have no idea, of course, how you store the data and how feasible it would be, it just seems that with the current REST API any meaningful application requires making a chain of several ping-pong requests to different endpoints. If I understand it right (I’ve never worked with it before), GraphQL API would allow client to shape the data it wants in the request query.
Ah, looks like a misread your comment regarding the audioURL and audios. Yes, the audios will contain readings for characters that have multiple and it will also include any duplicates we have from different speakers (described in the source property).
I haven’t looked into GraphQL too much, but just glancing at the website again it looks pretty cool. Right now it’s probably not feasible because our data is mostly running from a Google Cloud Datastore (https://cloud.google.com/datastore/). It was a good choice about 10 years ago, but they are kind of slow expensive dinosaurs in this day in age. We’re planning on moving to MongoDB in the nearish future which should increase our query flexibility quite a bit.
Interesting. I wasn’t familiar with Google Cloud-based DB options. It seems that this Cloud Datastore is generally not a bad option, but you say it’s not a good fit for Skritter data… Another similar option I see is the Firebase Realtime Database (EDIT: or is it just for caching?), which seems to be fast and have simple (JSON-based) API, but probably isn’t a good option due to the scale/price.
By the way, how big is your data? (if it’s not a secret, of course)
The datastore thing is not great for a slew of reasons. I guess for smaller apps with more static data it might be alright, but for larger apps with lots of moving components it’s rather lacking. 1.) They don’t provide any feasible builtin backup solutions and charge you for using their hacked together one. For a company with any amount of data this can cost $1000’s of dollar for a single full backup and take hours. 2.) In comparison to other nosql databases it’s very limited in how you can query and access data. 3.) We’ve noticed that for even some simple queries there is just some inherent lag when compared with other services. It might only be 50-200ms extra, but in the large scale of requests that is a huge performance hit.
The realtime database is cool if you’re got things like leaderboards, but would be a bad choice for storing large scale amount of data. It’d essentially be like the datastore, but even more limited.
OK. I see you reasons. Thanks for explaining.
You mentioned the scale of the data, but didn’t say about the size explicitly, which makes me even more curious about it