Very Good Passive Vocab Strategy

Recently found this:


My method is:

  1. Download the subtitles using this greasemonkey script
  2. Drop the subtitle file into CTA
  3. Copy the unknown words into Purple Culture’s vocab list generator
  4. Copy the resulting list into Google Docs
  5. Download as CSV
  6. Import into Anki
  7. Add audio with AwesomeTTS

This may seem like a lot of steps but it only takes a couple minutes. I wouldn’t add all the unknown words, just enough to get to 90% or more comprehension. There’s a few more steps in there that are specific to my process, like adding tags, but that’s the basic process.


Basically all you need is the following:
Step #1 there are greasemonkey scripts you can downlaod for Disney Plus and Netflix (probably youtube, etc)

  1. Get access to legacy skritter website and download all your words in an excel
  2. Import your excel words into Chinese Text Analyzer
  3. Download the subtitles using this greasemonkey script
  4. Save as a CSV
  5. Drop the subtitle file into CTA

Basically Chinese Text Analyzer then looks at the words you know and the words in the subtitle text and the words you don’t know “fall out”.
It then ranks the words based on frequency in the movie from highest spoken to lowest spoken. It will also give you feed back and will tell you “you know” 92% of the words in this script, etc.

I haven’t found a more powerful tool when it comes to being able to just “relax” and watch a movie in chinese, knowing that I’ll just take 2 seconds to get all of these statistics and words/chengyu that I don’t know from the software. So the passive vocab strategy is basically just being exposed to a bunch of words by watching anything, knowing you can make it as active as you want with your next rewatch by processing however many new words you think necessary.

I’m currently using Wenlin dictionary to determine “relative frequency”, however I was hoping someone had either a website or an excel file, where you can easily look up the frequency of the word. I know of some that are out there, like the one below.

But these files are confusing. For example many words are listed as ranked “11” like 50 times. I’m sure someone would understand that. But in liue of this was hoping to find a straightforward excel with maybe like the top 50,000 words. So 1) I can vlookup the file without breaking my spreadsheet 2) I can understand how the words are ranked.

For anyone interested:

If you click into each file, you can download a CSV file. Then convert them to excel and copy and paste them to create one list. This is good for 1) “vetting” new words to see how frequent they are, before bothering to learn them 2) You can find the gaps in your vocab words by doing a vlookup function on the 56k list, any gaps should fall out.

Don’t know what it is based on but “zero to hero” is a big name in the language learning community.

1 Like

Just fyi to anyone interested…it is a really good list.

It is this one:

  • Draft for modern Chinese word set for common useAn external file that holds a picture, illustration, etc. Object name is pone.0010729.e003.jpg≫ (An external file that holds a picture, illustration, etc. Object name is pone.0010729.e004.jpg) (2008) compiled by the State Language Commission of China [9]. This list contains 56,008 frequency-ranked words, the frequencies of which are based on a segmented part of 45 million characters from the Chinese (General) Balanced Corpus, a segmented corpus of 135 million characters based on People’s Daily 2001-2005, and a modern Chinese literature corpus of 70 million characters constructed by Xiamen University. The word frequencies themselves, however, are not yet publicly available.

I think something went wrong with your post. The link you shared doesn’t work, but the preview shows information that doesn’t match the text you have posted below it. The text you posted also contains broken links to images and symbols I don’t know what they mean. You also have a reference [9] in your text that doesn’t lead anywhere.

In any case, I’ve worked a lot with the SUBTLEX-CH corpus and built decks that show high-frequency words in that list that are omitted from HSK. You can check it out here (also contains links to Skritter decks for studying this in our apps):

For more frequency stuff, also check:

1 Like

This webpage has has a Table. The table is labeled Word frequency lists of Chinese.
One of the lists is: Draft for modern Chinese word set for common use
This complete list is available in the reddit post made by zerotohero:

You can create an excel of this list pretty easily if you just click around. There is a text file you can copy and paste into an excel. I would just post the excel list here in Skritter but Skritter doesn’t allow that.

1 Like

Thank you, the links work now!

1 Like

you’re the man! Thanks again.

1 Like