Characters Learned metric meaning?

Apomixis · June 4, 2020, 3:25am

What does the “characters learned” metric mean? I had thought it was a running total of the cumulative “characters” I have been exposed to.

However, this week, I finished a book I was learning from and the resource said that it exposed me to over 700 characters. However, my Skritter metrics say about 448 characters in total.

I thought that was odd, so I exported my Skritter lists for that resource, and 15 minutes of Matlab programming later, had a “unique character count” of 812 characters for that book. (Flattening all multicharacter entities to their individual characters and then doing a unique-sort combo on the concatenated set)

I know I’ve banned a couple dozen items, so my character “exposure” should be in the high 700’s.

Then I thought, “maybe Skritter is only talking about number of SINGLE character items I’ve added from lists”. 2 more minutes on Matlab and I was told this book had around 400 “single” character entities. (Throwing away any multi-character entries from the count). I’ve added a few on my own from other lists, so the 448 number “smells right”.

However, I thought the character count metric was telling my how many characters I’ve seen. But that doesn’t appear to be the case.

It would be nice to have a metric that tells me what my character exposure has been.

For example, I know 经验，经历，经济，经过 , and 已经，but it’s true that I don’t have 经 separately on any of my lists, so it looks like it would NOT be a character that appears on the character count metric. As such, I’m not really sure what use the current metric is, OR I don’t understand its intent.

bezdomny · June 4, 2020, 9:37am

No one really knows, including the devs. It probably also means something different depending on which platform you see it.
At least to be counted it requires you to know it, not merely have been exposed to it. If you get cards wrong the number will go down.
I suspect it doesn’t matter if you don’t have the character only as part of a multi-character word.
This text full of maybes comes after an attempt to figure this out with skritter support. Clearly, we failed.

Apomixis · June 4, 2020, 3:09pm

I had considered that the degree of “knowing“ from the SRS algorithm might come into the overall determination of if the character is learned or not. I have my Retention settings set for 97% .

However the gap between the 800+ characters on my list and the low 400 “characters learned” metric is too big to be explained by this, IMO. I haven’t marked that many characters wrong recently, and Skritter isn’t presenting me with several hundreds of reviews a day. So, Skritter appears to believe I know the characters based on what it presents for reviews.

Sometimes when I mark things wrong, I will see the number of characters learned and words learned shift up and down, but it’s usually +/- A half dozen or so…not hundreds.

Something just doesn’t pass the “sniff test”, so I’m trying to figure out what the metric is supposed to mean and be telling me.

SkritterMichael · June 5, 2020, 5:32pm

Hey, so you’re right about the “characters learned” metric being only counting single character vocabs you’ve added to your queue. See Skritter | FAQ for more details.

We’ve had internal discussions and even made mockups about updating how single characters are calculated in Skritter. I think (disclaimer: I had nothing to do with designing the original stats system, and none of the following are promises for features) the original spirit of it was similar to how the HSK says X words and Y characters, but actually calculating that across all your vocabs whether something you just learned contains a unique character was tricky to accomplish on the system at the time. Plus, as @bezdomny mentions,

Having said all that, we’ve been discussing changing the stats around to be something more like “You know sum(chars + words) words made up of unique(chars + words) characters.” It’s more intuitive and useful for a learner to gauge their level, especially in cases like you mentioned @Apomixis where a graded reader or book might mention its unique word count. Focusing on and overhauling our stats is on our radar, but I can’t give you a timeline when to expect anything yet.

bezdomny · June 6, 2020, 5:05am

I don’t think that’s true, @SkritterMichael. Single character vocabs I have added are less than a thousand and the characters learned metric shows 2000+. The 2000+ number is not very far off from the number of unique characters appearing in all my added vocab (single+multi).

You may want to find a thread in the team@skritter.com mailbox titled “custom export for forgotten words/characters” from March this year. Goal was not so much to pinpoint the exact number of characters learned, but rather to audit the SRS; why am I getting so few reviews despite my retention rate never getting to where it should be, which words/characters do I apparently keep forgetting?

Apomixis · June 6, 2020, 4:33pm

@bezdomny Do you have traditional turned on? I study only simplified characters. When I was doing my analysis, the results change a lot if I used the traditional character column by mistake. The exported lists contain both sets of characters.

Edit: My analysis concurs with @SkritterMichael‘s analysis, but I wonder if studying both simp/trad munges the metrics.

bezdomny · June 6, 2020, 5:16pm

Nope, simplified only. Never been otherwise. And the export contains only simplified (and a bunch of radicals). Also the “Add characters when adding words” setting is off.

Characters Learned is what shows under All Time stats, together with Words Learned, Time Studied and Days Studied, right?
And when you go to Legacy Skritter and Export Words (selecting All) the number of single character entries in that list matches the number of Characters Learned (minus a few forgotten ones)?

Apomixis · June 7, 2020, 1:18am

I have never tried exporting my words from the legacy site before, so I thought I would try and see what showed up.

When I exported from the legacy site I could pick characters or words, and it told me I had 446 characters when I exported the characters. And then when I exported words it told me I had 978 words exported.

I also exported “everything“ on the legacy site and it told me there were 1424 items, which is the total of 446 and 978, so that matches.

I also used the current website (not the legacy website) to export all of my words, and it exported them and also had a list that had 1424 words (I did not include banned items). That total matches the legacy site.

I then ran my own analysis in Matlab with my script and it said that there were 446 single character listings within my lists (which matches the above) and of all the items in my list together (decomposing multi-character items) they were composed of 863 single characters all together. In total, my analysis said I had 1424 unique list items, which again match earlier data.

If I now look at the stats on the modern iOS application, it says I have 448 characters learned and 975 words learned. So, these numbers are in the ballpark of what the legacy site lists and the website list on the character list. However it is odd that the modern iOS app says that I have learned more characters than the character list shows in total. So I’m not sure how the 448 in the statistics comes up, because the individual character lists say there’s only 446 characters on there.

So, that’s what my statistic show and what my lists show.

Edit: interestingly, on the iOS app I hit the refresh button on the stats page, and my number of characters learned changed to 446, but it says that I have -2 characters learned for the day. So even though my total number of characters is only 446 it says that I have now unlearned two of them, which is still strange!

Edit 2: The data sets from the legacy Skritter site were actually much more organized and clean and easier to process, because they all used a trailing tab at the end of the line. So you could always know that there were a fixed number of tabs within a single line of data. The export coming from the current website has all sorts of data oddities that are hard to work around without the trailing delimiter being present. For example lots of words have their own carriage returns and line feeds within their own definitions, which caused the exported data to show up with multiple carriage return and line feeds. I don’t know if the Skritter team did this on purpose for some other application, but it makes it really hard to process the current website’s exported lists cleanly, without going in and manually cleaning up the data before you process it. However the legacy Skritter data sets were easy to process even with all those extra carriage returns and line feeds, because you could always depend on there being tabs at the end of each data line as a delimiter.

bezdomny · June 7, 2020, 3:55am

At least it’s transparent for your setup.

Therebackagain · June 7, 2020, 7:08pm

I have to trust Skritter’s numbers for what I’ve actually learned, for there’s really no other way to know at this point without taking all the HSK tests. I hope they are accurate.

For me, the legacy and mobile beta app characters learned number both line up at 4120 characters, but the beta app is counting written characters in that count, which is deceiving because I study both traditional and simplified characters.

I think the more reasonable number is 3250 characters (although I surely don’t remember them all!), which is a number I can only get from the legacy app, because stats for characters/words learned there are broken down into definitions, tone, pinyin and writing. The definitions number is in my experience the most accurate, especially when learning both versions.

I’m hoping all the great features of the legacy website will be brought forward into the new website, because the progress graphs there are way more informative and useful.

Similarly, the progress charts on the legacy mobile app had extremely useful similar breakdowns. I hope to see these on the new app soon. (Although there are new progress indicators I also appreciate, especially the immediate summary following a review session.)

SkritterMichael · June 9, 2020, 1:24pm

@bezdomny, it’s hard to say how the numbers got the way they did, especially for users who have used Skritter for a while on different clients. On some older clients, studying a word like 你好 would add attempts for both 你 and 好 as individual characters and so in theory you could learn “你” without ever adding it. This is disabled on newer clients, so the numbers will reflect only what you’ve explicitly added.

@Apomixis we’ve already got an issue open for improving the vocab export on the new website. I’ll add a note to look into the tabs on exported files. Thanks for pointing that out.

@Therebackagain when we overhaul stats, there will be graphs!

meiadeleite · November 20, 2020, 9:52pm

I apologize for reviving this topic after so long.

I’m studying for an HSK4 next month, which means I should know 1200 words if I study the 4 decks that make up the vocabulary for the exam. At the moment, I’m 10-15 words short of finishing the last deck, but according to my Progress page, the number of words I learned is just 888. I cannot quite understand this difference… Is it the mismatch mentioned above? Has the app judged that I don’t know the 300 or so character difference well enough? Am I missing a secret deck that I forgot to look at?

Apomixis · November 20, 2020, 10:08pm

The best way I know currently of seeing which words are known, and which of the types of tests (writing, definition, tone etc. ) are known, is to go to the legacy Skritter site (not the new web version) and look at each of your lists section by section. On the legacy system it shows a little green square for each test that you have in progress, color coded with the darkness of the green by how well you know it. I found periodically that there are some things that I’ve “learned” with the latest iOS app, but that it didn’t add all of the different type of tests to the system. So I’ll find that I’ve been doing the the writing but that none of the other tests got added on the new app.

Here’s what it looks like (also shows 3 words I haven’t studied/learned yet)

meiadeleite · November 21, 2020, 3:44pm

Thank you for this tip — progress is really easy to see on a table like that!

I’ve been using the new iOS app exclusively for quite some time now, and sadly it looks like these tables are not being updated for my account on the legacy site. Is there a way to force a sync?

Apomixis · November 21, 2020, 4:34pm

My understanding is that the new app does not have list sections being learned individually like the old app did, so the page that you show with your diagram doesn’t get updated anymore because it doesn’t have any meaning on the new app. My lists show the same thing that you see with individual sections not marked, even though “all” the words are learned (see 2nd image below). Skritter’s internal mechanisms have changed and the old “adding”, “done” concepts don’t appear to be meaningful anymore.

If you click on each section that you see in the image you showed, then you’ll see the chart that shows the individual word displays. Those will be updated. You can look at the timing to verify they are updated. Do this by looking inside each section at the word details. If you click on an individual word you can see the timing of when the word was last given to you and when the next delivery of that test will be. (See 1st image below, look in the table with the blue header bar. “Next” says when that test will next appear, and “Last” says when Skritter last gave you that test)

I also only use the newest latest iOS app for any real activities, but for looking at certain database issues related to words, the legacy app has tools that still haven’t been migrated over to the iOS app.