- Hello everyone, I am Old V. A very unprofessional AI non-geek. **
I woke up at 6 o’clock this morning. Probably because he was old, he couldn’t sleep. So he didn’t insist, took out his mobile phone and started chatting with Kimi. Since Kimi upgraded, I’ve found it more confident. The daily output sentence pattern is, either three ways to help you complete the entire process of xx. Or, it is summarized in one sentence, XXXXXX. Whether accurate or not, first establish your knowledgeable AI.
As we all know, Chatgpt contexts have reached 128K, Claude has more, and Gemini is known as 1M contexts. Kimi and I consulted in detail about the definition and calculation of the so-called context of the large model.
Simply put, when calculating the context window, the current mainstream AI model ** calculates the number of tokens of both prompt and completion. In other words, the length limit of the context window is a ** total length limit **, including:
Tokens entered by the user (including system prompts, historical conversations, etc.)
Tokens generated by the model (i.e. output)
An example:
If the context window of a model is ** 128k tokens **(such as GPT-4-turbo-128k)
You entered ** 100k tokens ** content (including prompts and conversation history)
Then the model can only generate output of ** 28 k tokens ** at most, otherwise an error of “exceeded context length” will be triggered
This is why in actual use, the longer the ** input, the shorter the output content **.
However, different AI models ** have different token limits on input and output **. They also impose their own limits on ** total context length ** and * maximum output length per time **, such as:
✅ 1. * * Total context length ≠ Input + output freely allocated **
Although most models limit the sum of ** input + output tokens to a fixed upper limit **, some models have ** additional restrictions ** on ** inputs ** and * outputs **, which are not completely freely allocated. For example:
| Model/Platform | Total context length | on the upper limit is input | output upper limit | description |
|---|---|---|---|---|
| Claude 3.7 | 204K tokens | No individual restrictions | No individual restrictions | The sum of input + output does not exceed 204 K |
| GPT-4-turbo | 128K tokens | No individual restrictions | Default 4K, can be adjusted to a maximum of 8K or 16K | Output is limited by the max_tokens parameter |
| DeepSeek V3 | 64K tokens | 56K tokens | 8K tokens | Clearly divide input and output upper limits |
✅ 2. * * Some models have a hard upper limit on output length **
Even if the total context is long, the ** output may be individually limited **, such as:
- DeepSeek V3 **: Clearly limits output to a maximum of 8K tokens.
- GPT-4o mini **: The output limit is 16K tokens, even if the context is 128K.
Okay, so this piqued my interest. For example, GPT 4 supports 128K contexts and approximately 96K Chinese characters. Claude supports 200K context and 128K output is available online. What if he can create short stories?
So I talked about this topic with Kimi again. I have to say that Kimi’s style as an expert is lacking. The answer it gave me is:
| model | Suitable subject matter | output upper limit | conversational experience | call entry |
|---|---|---|---|---|
| Kimi-K2 | Cool literature, city, group portraits | 128 k token | Chinese has the best sense of language, and multiple rounds of continuous writing are the most consistent | kimi.moonshot.cn/API |
| Claude 3.7 | Emotions, group portraits, suspense | 128 k token | The characters are psychologically delicate and logically rigorous | Anthropic Official/Claude Code |
| GPT-4-turbo | suspense, reasoning, science fiction settings | 4 k~16 k token | The structure is clear and the writing style is more “standard” | OpenAI API/ChatGPT Plus |
| DeepSeek-R1 | Science fiction, brain holes, suspense | 8 k token (single) | In-depth thinking mode is creative, but long articles need to be segmented | deepseek.com/API |
| model | Highlights suitable for short stories | obvious shortcomings | One-sentence usage advice |
|---|---|---|---|
| Qwen3 | Chinese has the best sense of language. You can produce a complete first draft of 8K 12K token in one go. You are good at outline → body → polish | English or cross-cultural scenes are slightly weaker | Pure Chinese, cool text/urban themes are preferred |
| Gemini | The 1 million token long window can be filled with the entire short story + rewriting history at one time for overall revision. | A single output is still locked at 65K, and long texts need to be segmented manually; the Chinese language sense is average | Use it when you need to “feed the whole book and then rewrite the whole book” |
| Grok | Real-time networking + humorous tone, hot science fiction, cyber satire short stories are interrupted in seconds | Insufficient Chinese details and character psychological depth; output limit is about 8K | Write light science fiction/black humorous short stories with time-sensitive stalks |
| GLM-4.5 | Neat structure and good closed loop logic, suitable for high-concept or suspense short stories | The level of creative surprises is average, and multiple rounds of prompts are needed to amplify the brain. | Suspense reasoning and high-setting science fiction can be used first |
| Okay, so, can I use GLM 4.5 to create a framework, then throw it to QWEN to generate the full text, and finally let Gemini act as an editor to check and polish? |
Kimi affirmed my thoughts and gave me hints.
1 ˇ GLM-4.5 skeleton (high concept + suspense closed-loop)️
Goal: Let GLM generate “worldview + main line conflict + chapter nodes” with a minimum of tokens, and control 800 to 1,200 tokens.
Demonstration prompt:
[Character] You are a suspense novel structure and are good at the "high-concept sentence + three acts and nine nodes" method.
[Task] Please give it in Chinese:
1. One sentence with high concept (≤ 30 words)
2. Three screens and nine nodes (each node ≤ 25 words)
3. A 100-word outline for each chapter (9 chapters in total)
[Theme] Near-future science fiction suspense
[Keywords] Memory tampering, quantum encryption wills
[Output Format] Markdown table
2 ˇ Qwen 3 expands into a complete short story (completed in one time within 10 k tokens)️
Goal: Expand the outline of the nine chapters into a smooth text of 8,000 to 10,000 words at one time.
Demonstration prompt:
replication
[Character] You are a best-selling author of Chinese urban science fiction with a delicate style and a lively dialogue.
[Input](Directly post the nine-node outline generated by GLM)
[Requirements]
- Each chapter is 900 to 1,100 words, a total of 9 chapters
- The first chapter begins with "When I woke up, there was a photo of me sleeping in my phone album"
- Conversation is colloquial, with limited third-person perspective
- The suspense progresses, leaving an open hook at the end
[Output] Pure text, no outline title retained
3 ˇ Gemini global polish + consistency check (1 million token window advantage)️
Goal: Read the original text + reader feedback (optional) in one go and output a refined manuscript within 65 k tokens.
Demonstration prompt:
replication
[Role] You are a senior science fiction editor who is good at deleting redundancies, strengthening suspense, and unifying timelines.
[Task]
1. Check whether the timeline, names, and technology settings are self-consistent
2. Delete duplicate information and compress 10% of words
3. Insert a "countdown" sentence at the key suspense
4. Change colloquial dialogue into more concise short sentences
[Input](Directly post Qwen 3 full text)
[Output] The entire text is polished without leaving any traces of modification
However, when operating on the web version, you should pay attention to:
The flaw of the free web version
·** GLM-4.5 website : A single conversation can only be 4k tokens (≈ 3,000 Chinese characters) at most. When you let it produce a 9-node skeleton, you can only keep it to exceed 3,000 words; if it is longer, you will score twice.
· Qwen3 website : Output about 2k tokens (≈ 1,500 Chinese characters) at a time. If you want to get 8 - 10k characters at a time, you must “continue writing” manually.
· Gemini web terminal (AI Studio)**: Although the window is claimed to be 1M, the free layer is also limited
‑ 60 requests per minute
○ Total amount of 20,000 tokens per minute
It’s enough for you to post Qwen’s 8k text to polish it, but ** can only return 8k token ** at a time, and if you exceed it, you will “continue”.A “segmentation” method that can be run through using a web page
① GLM: Let it first give 9 nodes, each node within 25 words, with a total length of <400 words, and get it done at one time.
② Qwen: Expand chapter by chapter according to node, with about 1,500 words per round. Click “Continue Writing” 5 to 6 times to make up 8k +.
③ Gemini: Post the full text of Qwen in one go. If the return is truncated, just say “Continue polishing”, and it will continue to output the previous context.
Fun, let’s get started. Of course, my theme settings were not based on its example, and I smuggled in. Also, I didn’t ask Gemini to polish it. Instead, I threw the frameworks generated by GLM to Deepseek, QWen 3, Chatgpt, Grok, Gemini, Claude, and even GLM itself and Kimi. I am very interested to know who in its door has the potential to be a literary hero.
The link to the generated article is posted at the bottom of the article. If you are interested, you can read it and judge it yourself, and then put your conclusions in the comment area to tell me. But as far as I am concerned, my feelings are: Gemini: A million contexts are not just a joke. It’s easy to output the entire novel. Above average quality. QWEN: The size of the same context is large, and the entire text is output at one time, but the quality is endless. I feel like it’s the bottom. Deepseek: Limited by the size of the overall context and the limitations of a single output, it only outputs one chapter at a time. You have to keep telling it to continue with chapter 2, continue with chapter 3, until the whole thing is output. How to say quality? I felt amazing at the beginning, but the end felt a little sloppy. Maybe it’s due to the size limit of 64K contexts? Kimi and GLM: When exporting, like Deepseek, you must keep talking and continuing. Grok: It also keeps saying and continuing, but what is different from other big models is that it does not output the article in the context, but a link. Click to open the sidebar. You can copy or download it. The quality of the novel is average, and I feel that there are two chapters that are partially repeated. Claude and ChatGPT are the two major models with the worst output. Let’s start with Claude: It seems to try to output the full text at once, but the context is limited or the request for a free player like me, and the output stops in chapter 3. I said to continue several times, but its output varied in length, so in the end I simply said I would not serve you. I simply stopped for a few hours and continued at night, barely finishing the output. And the further we go, there are obvious logical problems.
After Chatgpt finished outputting Chapter 5, Chapter 6 began to output the content of my previous topic. I was shocked at the time and didn’t know what to do for a while. In the end, he had no choice but to attach half of the work he had created together with the prompts to complete the latter half of the creation. The quality of the paper is good and it is on par with deepseek. But it was also a little anticlimactic in the end.
Okay, if you are interested, please read the works of several masters.
##Novels created by major models
The following are short stories created by each major model based on a unified framework. Everyone is welcome to read and comment:
- [Novel written by ChatGPT](./ Article_Chatgpt)
- [Novel written by Claude](./ Article_Claude)
- [Novel written by DeepSeek](./ Article_DS)
- [Novel written by Gemini](./ Article_Gemini)
- [Novel written by GLM](./ Article_GLM)
- [Novel written by Grok](./ Article_Grok)
- [Novel written by Kimi](./ Article_Kimi)
- [Novel written by Qwen](./ Article_QWEN)