Is there any hope for more accurate list progress?

This is definitely just a nice to have… but for me it’d be a very nice to have. I use a lot of different lists with skritter, which I will use to add words into my general queue… or even just to keep track of what I’d like to study in the future. But the downside to this is that I have no clue what the actual number of words I need to study in a list are. For example: I made a list of every word in a book I’d like to read, taking out HSK and TOCFL words…it still has 5000 words. Spot checking, I already know many many of them (this is to be expected – it’s every word, after all!). I’d love to have an actual accurate count of the words in this book though… this applies to all of my lists, really. It’d be really heartening to have an accurate count, and could give me a better sense of what I’m working towards.

I might be one of the few people who work in this way, but for example, there are a lot of frequency lists… one I’ve long had my eye on is the 6000 words frequency list from newspapers. I’d really love to know what percentage of this list I already have covered!

Thanks. <3 skritter!

Hi, this is a feature we’ve discussed as a team and all want. Unfortunately the way the system is currently set up, it’d be possible but be very slow and non-performant to get that count. It would work for a few users, but I’m not confident it would scale to our entire userbase without causing service problems. Our path forward around this issue is a data migration (in progress) to a new database that fits our use case better and will make features like this possible. So this probably won’t happen in the next month, but once it’s feasible, it is a feature we want you to have!

1 Like

It’s great to know that this is on your radar…thank you for the response!

That said, just thinking out loud, would this be a prohibitively expensive batch job, vs. something that is constantly online? It seems like it would just be a matter of updating every user’s lists… you put the words they’ve studied into memory, and then iterate over the lists. All of these values are fairly small, it’s just a question I guess of what format lists and studied words are in, if that is particularly expensive or not?

Another option is to try something like this:

You could maintain a bloom filter for every list, and for everyone’s progress. Then you can estimate the intersection of two sets!

This has a lot of nice properties:

  • really cheap
  • easy to regenerate
  • you only need one per list, but if someone forks a list, you just copy the bloom filter.

Downside:

  • if they delete stuff, you need to regenerate the bloomfilter. A cheap: do a bloomfilter per section. Thus, you just need to merge the sections that aren’t edited.

I feel like this would be a really nice solution in the short term, because migrating databases is a complex and time consuming thing…

1 Like

I also want to have this for a long time, because I work similarly:
I have my “main” big lists like HSK that I want to cover long term. Short term, I’m adding small lists like chinesepod lessons, graded readers etc. It would be really cool to see how much of a big list one already covered.

Here is my manual workaround, all on the legacy page:

  1. Download the .csv export of the list, for which you want to see your “real” progress, from the list’s page
  2. Fetch all words you have added to your total stack so far from the “my words” page
  3. Add both into one excel sheet (just the columns of the characters are needed, delete all lines that are not characters)
  4. Use “countif” to see if the list’s words are already added to your stack

I know “added words” are not necessarily “learned words”, but it is close enough for me.
Similarly, you can compare your total stack of added words to any list you can get your hands one, not necessarily exported from skritter (frequency lists, book vocabulary lists, etc).

1 Like

That workaround totally works! The last time I tried to export my total stack I remember it took ages, but that was a long time ago. If it’s reasonably fast I think that’s fine :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.