LLMs work very well. But there is a translation problem between LLMs and the environment that LLMs have to interact with (software and humans).
Is the Deepseek-OCR Context Compression concept relevant to this? https://deepseek.ai/blog/deepseek-ocr-context-compression
I heard Karpathy respond that he feels it might make more sense to do away with tokens and process everything as images similar to how humans do.
Is the Deepseek-OCR Context Compression concept relevant to this? https://deepseek.ai/blog/deepseek-ocr-context-compression
I heard Karpathy respond that he feels it might make more sense to do away with tokens and process everything as images similar to how humans do.