[Skritter API] Meaning of "Toughness" field in the Vocab entity

I am trying to use Skritter’s API to write a script that creates VocabLists based on the kanjis already learned by the user. To do that I am thinking of using the “toughness” field to decide which word to add first.

Could you please explain how the “toughness” value was decided for each word?
Does is correlate with JLPT levels, or with the word’s rank in some frequency list? or was it decided on an ad-hoc basis by ballers?

I was not part of the original team that implemented the toughness property, but I did find an old newsletter from 2011 when from when the feature was added.

Toughness Indicator
Ever wonder how useful a particular word is to your learning? We did too, which is why we created a “toughness” scale based on frequency, importance in textbooks, and difficulty. Check it out in the word popup and read more about it on the blog.

It sounds like it was a custom script that has likely been lost to time that that was mostly based on frequency, with other factors as number of strokes and how often it showed up in the vocablists we had available at the time.

I’ve never found toughness super helpful. It’s all contextual to individual learners anyway, unless you look at research done on Chinese language acquisition, which generally states that the more strokes in a character (often this is called character density) the harder it is to learn.

My advice-- ignore it.

These vocab lists sound awesome and should play very, very nice with some of the new study modes we’re hoping to include in the Skritter Mobile experience before too long.

Good luck on the project and keep us updated!


Thank you Josh for the information :slight_smile: it’s very helpful!
I was toying with same kind of signals to generate a priority value, it seems that I don’t need to do it myself.

Hi Jake,
Thanks for the great feedback, I will keep that in mind.

My end goal is to build a plugin that dynamically adds new vocabulary using characters that I have already learned.
As a start, I generated a VocabList that emulates this for the RTK1 book (List#: 5344356491984896)
Studying it with RTK1, I noticed a very good impact on my character retention rate.

The problem is that around the middle of the RTK list the number of unlocked vocabs explodes, hence I am trying to filter out less important vocabs.

Currently, I draw vocabs from an aggregate of Skritter’s JLPT4~1 lists, so I am thinking to change that to a more compact list of vocabs. Do you have any recommendation for a list that has a good coverage of Joyou characters?

Also, don’t you think such a plugin would be a good feature candidate for Skritter? you’d save me a lot of work is all I am saying :wink:

I think it would be a killer feature. That said, it is such a low priority right now that it could be years before we get to it unless we see serious growth after releasing the updated mobile apps.

It might be interesting enough on a personal level that @SkritterMichael and @josh could do some weekend hacking with you on the script, but we can’t build too many new features that we don’t already have planned until the apps are out in the store and we’re seeing growth.

I do know a little about the difficulty script! It’s a special sauce number that takes into account frequency on some curated lists (e.g. for Chinese HSK and some other learner-friendly corpuses), but also a little bit of data on how well Skritter users did on writing that character. So if a lot of Skritter users had trouble on some complicated character, the difficulty would go up. However, when it was ran years and years ago, there wasn’t 1. as complete a set of vocabs in the Skritter database or 2. as much study data available. So the resultant toughness values are pretty inaccurate. I wouldn’t really trust the rating to indicate much and we should probably replace it with some other indicator. However because it isn’t used to determine anything and is really just a factoid on the vocab info screen, it’s not a high priority.

Internally we’ve been testing out some analysis tools for suggested vocabs and lists that we’re using to design and keep track of content (e.g. you know 土and 也, so 地 should be an easy character to learn. If you know 地 and 球 but not 地球, that would be another good suggestion), but running them on all the vocabs for all our users would not be very performant. So it needs some optimization before it’s publicly facing. But I would like to get something like that to users eventually!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.