Very Good Passive Vocab Strategy

PearlWall · August 4, 2022, 5:56am

Recently found this:

https://www.reddit.com/r/Anki/comments/kngnfl/tips_for_using_language_learning_with_netflix/

Blockquote

My method is:

Download the subtitles using this greasemonkey script
Drop the subtitle file into CTA
Copy the unknown words into Purple Culture’s vocab list generator
Copy the resulting list into Google Docs
Download as CSV
Import into Anki
Add audio with AwesomeTTS

This may seem like a lot of steps but it only takes a couple minutes. I wouldn’t add all the unknown words, just enough to get to 90% or more comprehension. There’s a few more steps in there that are specific to my process, like adding tags, but that’s the basic process.

Blockquote

Basically all you need is the following:
Step #1 there are greasemonkey scripts you can downlaod for Disney Plus and Netflix (probably youtube, etc)

Get access to legacy skritter website and download all your words in an excel
Import your excel words into Chinese Text Analyzer
Download the subtitles using this greasemonkey script
Save as a CSV
Drop the subtitle file into CTA

Basically Chinese Text Analyzer then looks at the words you know and the words in the subtitle text and the words you don’t know “fall out”.
It then ranks the words based on frequency in the movie from highest spoken to lowest spoken. It will also give you feed back and will tell you “you know” 92% of the words in this script, etc.

I haven’t found a more powerful tool when it comes to being able to just “relax” and watch a movie in chinese, knowing that I’ll just take 2 seconds to get all of these statistics and words/chengyu that I don’t know from the software. So the passive vocab strategy is basically just being exposed to a bunch of words by watching anything, knowing you can make it as active as you want with your next rewatch by processing however many new words you think necessary.

I’m currently using Wenlin dictionary to determine “relative frequency”, however I was hoping someone had either a website or an excel file, where you can easily look up the frequency of the word. I know of some that are out there, like the one below.

But these files are confusing. For example many words are listed as ranked “11” like 50 times. I’m sure someone would understand that. But in liue of this was hoping to find a straightforward excel with maybe like the top 50,000 words. So 1) I can vlookup the file without breaking my spreadsheet 2) I can understand how the words are ranked.

PearlWall · August 8, 2022, 5:15pm

For anyone interested:

If you click into each file, you can download a CSV file. Then convert them to excel and copy and paste them to create one list. This is good for 1) “vetting” new words to see how frequent they are, before bothering to learn them 2) You can find the gaps in your vocab words by doing a vlookup function on the 56k list, any gaps should fall out.

Don’t know what it is based on but “zero to hero” is a big name in the language learning community.

PearlWall · October 17, 2022, 7:21am

Just fyi to anyone interested…it is a really good list.

It is this one:

Draft for modern Chinese word set for common use ≪≫ () (2008) compiled by the State Language Commission of China [9]. This list contains 56,008 frequency-ranked words, the frequencies of which are based on a segmented part of 45 million characters from the Chinese (General) Balanced Corpus, a segmented corpus of 135 million characters based on People’s Daily 2001-2005, and a modern Chinese literature corpus of 70 million characters constructed by Xiamen University. The word frequencies themselves, however, are not yet publicly available.

SkritterOlle · October 23, 2022, 8:38pm

I think something went wrong with your post. The link you shared doesn’t work, but the preview shows information that doesn’t match the text you have posted below it. The text you posted also contains broken links to images and symbols I don’t know what they mean. You also have a reference [9] in your text that doesn’t lead anywhere.

In any case, I’ve worked a lot with the SUBTLEX-CH corpus and built decks that show high-frequency words in that list that are omitted from HSK. You can check it out here (also contains links to Skritter decks for studying this in our apps):

For more frequency stuff, also check:

PearlWall · October 24, 2022, 2:21am

This webpage has has a Table. The table is labeled Word frequency lists of Chinese.
One of the lists is: Draft for modern Chinese word set for common use
This complete list is available in the reddit post made by zerotohero:

https://www.reddit.com/r/ChineseLanguage/comments/okonc2/the_ultimate_chinese_phrasebook_56000_phrases/

You can create an excel of this list pretty easily if you just click around. There is a text file you can copy and paste into an excel. I would just post the excel list here in Skritter but Skritter doesn’t allow that.

SkritterOlle · October 24, 2022, 5:20pm

Thank you, the links work now!

PearlWall · October 25, 2022, 9:59pm

you’re the man! Thanks again.