Large language models like ChatGPT have traditionally had trouble reading and interacting with spreadsheets, limiting their application in this realm, but recent research from Microsoft claims to have found an answer.
The paper, SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models, described the problems LLMs typically face with spreadsheets and proposed what it called the “SheetCompressor” framework to address them.
The issue LLMs have with spreadsheets has to do with tokenage requirements. LLMs, generally, run on “tokens,” which are the basic units of data the model processes. Tokens are words, character sets, or combinations of words and punctuation that are used by large language models to decompose text into. LLMs operate by converting input text into a series of tokens, which the model then uses to understand and generate responses.
The number of tokens determines the computational cost and capacity needed to handle the input, making token management crucial, especially for complex data like spreadsheets. For example, the phrase, “I heard a dog bark loudly at a cat” would be represented by eight tokens, one for each unique word. In order to preserve system resources, many LLMs have token limits, but even in a limitless environment, complex jobs are resource intensive, with significant computational effort that affects both performance and efficiency.
Typically, each part of a spreadsheet — even blank cells or repeating cells or those with irrelevant information — costs tokens, meaning even a simple spreadsheet has a much higher token requirement than traditional text. Furthermore, LLMs often struggle with spreadsheet-specific features such as cell addresses and formats, complicating their ability to effectively parse and utilize spreadsheet data. These challenges have limited just how much generative AI models can be applied to reading and interacting with spreadsheets. Considering how many spreadsheets the profession tends to use, this consequently limits their application towards deep accounting work.
What Microsoft researchers discovered, in short, is that the LLM does not need to burn tokens reading and processing the entire spreadsheet. Instead people can create a compressed version of the document to function as something like an index, with markers or “anchors” indicating especially important information like totals. Additional compression comes from grouping together similar types of data like date columns. So, in a sense, the LLM does not work through the spreadsheet itself but instead references it via a much more efficient index.
Complex spreadsheets are further supported through a concept called “chain of spreadsheet,” which is similar to “chain of thought” prompting. The method unfolds in two stages. First, the model identifies the table that is relevant to the query and determines the precise boundaries of the relevant content. This step ensures that only pertinent data is considered in the subsequent analysis. Then, the query and the identified table section are re-input into the LLM. The model then processes this information to generate an accurate response to the query.
“Through the CoS, SPREADSHEETLLM effectively handles complex spreadsheets by breaking down the process into manageable parts, thus enabling precise and context-aware responses,” said the paper.
Experiments with this method found that it significantly increased performance on larger spreadsheets where token limits are a particular challenge. The F1 score (which is used to measure the accuracy of an AI model) for massive spreadsheets was 75% higher than GPT-4 and 19% higher than TableSense-CNN, another spreadsheet methodology for AI; for large spreadsheets, the difference was 45% and 17% respectively; for medium spreadsheets it was 13% and 5%; and for small spreadsheets it was 8%. Overall, the results show that while the method gets more effective the larger the spreadsheet, it can still improve the efficiency of even small spreadsheets.
“Through a novel encoding method, SHEETCOMPRESSOR, this framework effectively addresses the challenges posed by the size, diversity, and complexity inherent in spreadsheets,” the paper concluded. “It achieves a substantial reduction in token usage and computational costs, enabling practical applications on large datasets. The fine-tuning of various cutting-edge LLMs further enhances the performance of spreadsheet understanding. Moreover, Chain of Spreadsheet, the framework’s extension to spreadsheet downstream tasks illustrates its broad applicability and potential to transform spreadsheet data management and analysis, paving the way for more intelligent and efficient user interactions.”
Implications
Donny Shimamoto, founder and managing director of accounting tech-focused accounting firm IntrapriseTechKnowlogies said, by enabling LLMs to “understand” tabular spreadsheets, accountants will have increased ability to either summarize or analyze a set of data. More than that, however, he said this will likely allow even non-accountants to do the same, removing the accountant as the middle person. However while some accountants may see this as a threat, he said what this would mainly do is clear the majority of simple inquiries from their plates, letting them save their energy for more complex questions and deeper analysis.
“Implementing something like this will require good testing to ensure that the risk of hallucinations is minimized, especially if it is going to help provide non-accountants with information to support decision-making,” said Shimamoto.
David Wood, a Bringham Young University accounting professor who specializes in AI within the profession, raised a similar point, as it would allow those without significant technical knowledge to do the same kinds of tasks that, previously, could only be done by seasoned accounting experts. He raised the example of novices being able to use generative AI to make spreadsheets that only expert professionals could put together. However, while he thinks this could be possible soon, he said that, despite the Microsoft research, it hasn’t arrived just yet.
“However, there are at least three challenges holding back using GenAI with spreadsheets: the size and complexity of the spreadsheets, and the required accuracy for most uses of spreadsheets. This paper takes a large step in the right direction, but it doesn’t solve all the challenges and more work will still be needed in each of these three areas. It would be a mistake to assume that after reading this paper, we have fully figured out how to use spreadsheets and GenAI together. More work is still needed. … I think the path these researchers are taking is significant, but the research “hasn’t arrived yet” meaning that more work is needed. The accuracy rates are just not high enough…yet. Hopefully this paves the way for the next researcher to move it forward further.” he said in an email.