Connect with us

Accounting

Which generative AI model did best on the CPA exam? Depends on the section

Published

on

ChatGPT is no longer the only large language model to pass the CPA exam.

After ChatGPT 3.5 initially bombed the CPA exam and then version 4.0 passed, it does remain the top performer overall. However, like any human accountant, it has its strengths and weaknesses.

These were part of the findings of a recent paper from Case Western Reserve University and accounting automation solutions provider AIgency. The researchers systematically evaluated the performance of Google Gemini, ChatGPT-4, Claude, Mixtral and Llama-2b on multiple-choice questions from CPA test preparation tools.

Overall, they found that ChatGPT-4 scored the best, with Claude 3-opus coming in a close second, followed by Google Gemini Advanced, then Mixtral-8x7b-32768. Llama 2B-70b-4096 did the worst.

Source: William Zacher Jr. & Sanmukh Kuppannagari

However, as the results show, not every model did uniformly well on all sections. ChatGPT, while a strong performer overall, was especially good on the BAR section for business analysis and reporting. Meanwhile, although its weakest point is REG, the regulatory area that is mostly devoted to tax regulations, it did better on this section of the exam than any other model. Claude was the best performer in the AUD section on auditing and attestation. While its weakest point was FAR, the section on financial accounting and reporting, even there its performance was second only to ChatGPT. Gemini was the second strongest performer on the BAR section, but did not do so well on REG. Mixtral, overall, had decent enough scores compared to a human but would only pass BAR, making it a mediocre player compared to its peers. Llama was the only one that would not pass any section, and it did especially poorly on REG. It was also the only one that did worse than a human. The average score for human test takers on REG was 59.19%, according to the paper.

“The study revealed that while some LLMs have made significant advances in mimicking the complex decision-making skills required for CPA exams, there remains variability in performance across different sections of the test,” said the paper. “This variability underlines the importance of tailored training and specialization in developing LLMs for professional applications such as the CPA exams.”

To perform the test, the researchers drew their multiple choice questions from the Becker CPA test preparation suite. Google Gemini, Claude and ChatGPT-4 were accessed via their online platforms. Mixtral and Llama-2b models were accessed through the Groq platform, an advanced computational infrastructure for high-speed AI processing. The questions were directly copied and pasted into the AI platforms from Becker’s test preparation material without any additional prompting or modification to ensure each AI model received the questions in their original form as they would appear in a CPA exam context.

Becker’s platform randomized the questions in batches of 15 questions, which the research said further mitigated potential selection bias. The tester, responsible for inputting the questions into the AI models, deliberately refrained from reading or evaluating the questions beforehand to prevent any unconscious bias in the prompting process. For each question, the tester selected the AI model’s first response marked as “correct,” irrespective of any variations in the explanations or outputs provided by different models.

Each AI model was subjected to each multiple choice section of the CPA test three times, allowing for a comprehensive assessment of its performance across multiple attempts. The criterion for determining an AI model’s success in this study was achieving a passing score, defined as an average score of 75 or higher, on any given section.

The researchers said the data indicates there is no one universal model for all tasks, so it is important to use the right model for the right applications. For example, the paper concluded that ChatGPT is “the only real option for zero-shot BAR automation,” as “no other model came close to its performance, and it had a relatively narrow variance,” meaning that ChatGPT-4 could be used to help with automated financial statement preparation or additional forecasting. On the other hand, the researchers said Claude was probably better on auditing-related tasks, which the paper said “is a solid indication that it can be used for fraud detection and internal control validation.”

“It is apparent from the results that there is no clear-cut winner,” the researchers concluded. “Most companies utilizing AI to perform financial administration functions should use a software infrastructure that allows them to use multiple task-dependent AI models.”

However, the researchers did recommend that “model selection for AI in an applied accounting setting should avoid Llama-2B, which performed worse than any other model in every section.”

Continue Reading

Accounting

EV makers win 2-year extension to qualify for tax credits

Published

on

The Biden administration gave carmakers a partial reprieve in finalizing electric-vehicle tax credit rules intended to loosen China’s grip on battery materials crucial to the car industry’s future.

Starting in 2025, plug-in cars containing critical minerals from businesses controlled by U.S. geopolitical foes, including China, will be ineligible for up to $7,500 tax credits, the Treasury Department said Friday. Automakers will get an extra two years, however, to shore up sourcing of graphite and other materials considered difficult to trace to their origin.

The rules put finishing touches on President Joe Biden’s push to develop an alternative to China’s preeminent EV and battery supply chains. The administration is imposing stringent sourcing requirements for raw materials and components in order for electric cars to qualify for the tax credits that are a powerful draw for consumers otherwise put off by still-high prices.

“These actions provide a strong signal to automakers that we want to see EVs built here in America with components and critical minerals sourced from the U.S. and our allies and partners,” White House Climate adviser John Podesta said.

The two-year exemption speaks to the challenges automakers have had reducing their reliance on Chinese suppliers of materials such as graphite. The mineral used in battery anodes emerged as a geopolitical flashpoint last year when Beijing placed restrictions on exports, sparking fears of global shortages.

The Biden administration’s rules don’t allow tax breaks for vehicles with batteries containing critical minerals from foreign entities of concern, a term referring to businesses controlled by US geopolitical foes such as China, North Korea, Russia and Iran. Those requirements take effect in 2025, as proposed.

But Biden has given auto and battery manufacturers some flexibility on this front, too. In December, the administration decided to allow materials from foreign subsidiaries of privately owned Chinese companies in non-FEOC countries — such as Australia or Indonesia — to count toward tax credit eligibility. This drew criticism from Western miners and policymakers who want Biden to more aggressively cut China out of the supply chain.

Automakers will now have until 2027 to curb the use of certain difficult-to-trace materials from FEOCs, provided that they submit plans to comply after the two-year transition and it’s approved by the government, the Treasury Department said.

“FEOC exemptions for any battery materials should be temporary,” said Abigail Hunter, the executive director of the Center for Critical Minerals Strategy at SAFE, a Washington think tank. “We need a clear exit strategy, lest we continue our dependencies on adversaries and further undermine the competitiveness of U.S. and allied critical minerals projects.”

The rules release concludes two years of work on requirements that already have reduced the number of EVs eligible for tax credits. About 20 models qualify today, compared to as many as 70 previously. Treasury Department officials said Friday they expect the number of qualifying vehicles to continue to fluctuate as companies adjust their supply chains.

Automakers including Tesla Inc., General Motors Co. and Toyota Motor Corp. have lobbied for additional flexibility to meet requirements. A lobby group representing automakers based outside the US praised the additional two years provided for the difficult-to-trace materials.

“It will take time for the global production and sourcing of graphite and other critical minerals needed to produce EVs to match the strict standards required by automakers,” Autos Drive America President Jennifer Safavian said in a statement.

Continue Reading

Accounting

Oregon senator Ron Wyden demands refunds for TurboTax customers over glitch

Published

on

Senate Finance Committee Chairman Ron Wyden, D-Oregon, demanded in a letter that Intuit give a refund to Oregonians who, due to a software glitch in the company’s TurboTax tax prep software, were steered toward taking the standard deduction when they would have paid less tax if they’d itemized. The senator said the company had known of this glitch in early April, but didn’t acknowledge it until shortly before the filing deadline.

The glitch, according to the Oregonian, affected about 12,000 people, some of whom reported having to pay hundreds more in tax dollars than they needed to. They were generally using the desktop version of the software, versus the online version.

“Fixing this error will require identifying all affected Oregonians, notifying them, and ensuring they can be made whole,” said the senator. “In part because of TurboTax’s various guarantees and market share, Oregonians who overpaid due to TurboTax’s error likely assumed the software opted them into claiming state standard deduction to minimize their taxes. That assumption was wrong. And because the vast majority of taxpayers understandably dread filing season and avoid thinking about taxes after it ends, many of those affected will not learn on their own that they overpaid. Intuit must act to inform them and help them get the full tax refunds they are entitled to receive.”

The TurboTax logo on a laptop computer in an arranged photograph in Hastings-on-Hudson, New York, U.S., on Friday Sept. 3, 2021. Photographer: Tiffany Hagler-Geard/Bloomberg

Tiffany Hagler-Geard/Bloomberg

An Intuit spokesperson said the company is currently working to resolve the issue, referencing their tax return lifetime guarantee.

“As part of our tax return lifetime guarantee, we are committed to the accuracy of TurboTax tax filers’ tax returns to ensure they receive the maximum refund possible. We are quickly working to resolve an issue impacting a small number of customers and actively engaging with those filers impacted to ensure their returns are correct and that they receive the maximum refund they are owed,” said the spokesperson.

The senator has also asked Intuit for an explanation of how this glitch happened in the first place, as well as an approximate timeline for the steps it took once it became aware of it. He has also asked for a count of precisely how many people were affected, as well as Intuit’s plans for both addressing this problem and what the company will do to prevent it in the future.

Continue Reading

Accounting

On the move: RSM names a client experience leader

Published

on

RSM US named its first enterprise client experience leader; the Financial Accounting Foundation is looking for nominees for its Financial Accounting Standards Advisory Council; RKL named a new office managing partner; REDW appointed three new vice presidents; and other firm and personnel news from across the accounting profession.

Continue Reading

Trending