Connect with us

Accounting

Which generative AI model did best on the CPA exam? Depends on the section

Published

on

ChatGPT is no longer the only large language model to pass the CPA exam.

After ChatGPT 3.5 initially bombed the CPA exam and then version 4.0 passed, it does remain the top performer overall. However, like any human accountant, it has its strengths and weaknesses.

These were part of the findings of a recent paper from Case Western Reserve University and accounting automation solutions provider AIgency. The researchers systematically evaluated the performance of Google Gemini, ChatGPT-4, Claude, Mixtral and Llama-2b on multiple-choice questions from CPA test preparation tools.

Overall, they found that ChatGPT-4 scored the best, with Claude 3-opus coming in a close second, followed by Google Gemini Advanced, then Mixtral-8x7b-32768. Llama 2B-70b-4096 did the worst.

Source: William Zacher Jr. & Sanmukh Kuppannagari

However, as the results show, not every model did uniformly well on all sections. ChatGPT, while a strong performer overall, was especially good on the BAR section for business analysis and reporting. Meanwhile, although its weakest point is REG, the regulatory area that is mostly devoted to tax regulations, it did better on this section of the exam than any other model. Claude was the best performer in the AUD section on auditing and attestation. While its weakest point was FAR, the section on financial accounting and reporting, even there its performance was second only to ChatGPT. Gemini was the second strongest performer on the BAR section, but did not do so well on REG. Mixtral, overall, had decent enough scores compared to a human but would only pass BAR, making it a mediocre player compared to its peers. Llama was the only one that would not pass any section, and it did especially poorly on REG. It was also the only one that did worse than a human. The average score for human test takers on REG was 59.19%, according to the paper.

“The study revealed that while some LLMs have made significant advances in mimicking the complex decision-making skills required for CPA exams, there remains variability in performance across different sections of the test,” said the paper. “This variability underlines the importance of tailored training and specialization in developing LLMs for professional applications such as the CPA exams.”

To perform the test, the researchers drew their multiple choice questions from the Becker CPA test preparation suite. Google Gemini, Claude and ChatGPT-4 were accessed via their online platforms. Mixtral and Llama-2b models were accessed through the Groq platform, an advanced computational infrastructure for high-speed AI processing. The questions were directly copied and pasted into the AI platforms from Becker’s test preparation material without any additional prompting or modification to ensure each AI model received the questions in their original form as they would appear in a CPA exam context.

Becker’s platform randomized the questions in batches of 15 questions, which the research said further mitigated potential selection bias. The tester, responsible for inputting the questions into the AI models, deliberately refrained from reading or evaluating the questions beforehand to prevent any unconscious bias in the prompting process. For each question, the tester selected the AI model’s first response marked as “correct,” irrespective of any variations in the explanations or outputs provided by different models.

Each AI model was subjected to each multiple choice section of the CPA test three times, allowing for a comprehensive assessment of its performance across multiple attempts. The criterion for determining an AI model’s success in this study was achieving a passing score, defined as an average score of 75 or higher, on any given section.

The researchers said the data indicates there is no one universal model for all tasks, so it is important to use the right model for the right applications. For example, the paper concluded that ChatGPT is “the only real option for zero-shot BAR automation,” as “no other model came close to its performance, and it had a relatively narrow variance,” meaning that ChatGPT-4 could be used to help with automated financial statement preparation or additional forecasting. On the other hand, the researchers said Claude was probably better on auditing-related tasks, which the paper said “is a solid indication that it can be used for fraud detection and internal control validation.”

“It is apparent from the results that there is no clear-cut winner,” the researchers concluded. “Most companies utilizing AI to perform financial administration functions should use a software infrastructure that allows them to use multiple task-dependent AI models.”

However, the researchers did recommend that “model selection for AI in an applied accounting setting should avoid Llama-2B, which performed worse than any other model in every section.”

Continue Reading

Accounting

Acting IRS commissioner reportedly replaced

Published

on

Gary Shapley, who was named only days ago as the acting commissioner of the Internal Revenue Service, is reportedly being replaced by Deputy Treasury Secretary Michael Faulkender amid a power struggle between Treasury Secretary Scott Bessent and Elon Musk.

The New York Times reported that Bessent was outraged that Shapley was named to head the IRS without his knowledge or approval and complained to President Trump about it. Shapley was installed as acting commissioner on Tuesday, only to be ousted on Friday. He first gained prominence as an IRS Criminal Investigation special agent and whistleblower who testified in 2023 before the House Oversight Committee that then-President Joe Biden’s son Hunter received preferential treatment during a tax-evasion investigation, and he and another special agent had been removed from the investigation after complaining to their supervisors in 2022. He was promoted last month to senior advisor to Bessent and made deputy chief of IRS Criminal Investigation. Shapley is expected to remain now as a senior official at IRS Criminal Investigation, according to the Wall Street Journal. The IRS and the Treasury Department press offices did not immediately respond to requests for comment.

Faulkender was confirmed last month as deputy secretary at the Treasury Department and formerly worked during the first Trump administration at the Treasury on the Paycheck Protection Program before leaving to teach finance at the University of Maryland.

Faulkender will be the fifth head of the IRS this year. Former IRS commissioner Danny Werfel departed in January, on Inauguration Day, after Trump announced in December he planned to name former Congressman Billy Long, R-Missouri, as the next IRS commissioner, even though Werfel’s term wasn’t scheduled to end until November 2027. The Senate has not yet scheduled a confirmation hearing for Long, amid questions from Senate Democrats about his work promoting the Employee Retention Credit and so-called “tribal tax credits.” The job of acting commissioner has since been filled by Douglas O’Donnell, who was deputy commissioner under Werfel. However, O’Donnell abruptly retired as the IRS came under pressure to lay off thousands of employees and share access to confidential taxpayer data. He was replaced by IRS chief operating officer Melanie Krause, who resigned last week after coming under similar pressure to provide taxpayer data to immigration authorities and employees of the Musk-led U.S. DOGE Service. 

Krause had planned to depart later this month under the deferred resignation program at the IRS, under which approximately 22,000 IRS employees have accepted the voluntary buyout offers. But Musk reportedly pushed to have Shapley installed on Tuesday, according to the Times, and he remained working in the commissioner’s office as recently as Friday morning. Meanwhile, plans are underway for further reductions in the IRS workforce of up to 40%, according to the Federal News Network, taking the IRS from approximately 102,000 employees at the beginning of the year to around 60,000 to 70,000 employees.

Continue Reading

Accounting

On the move: EY names San Antonio office MP

Published

on

Carr, Riggs & Ingram appoints CFO and chief legal officer; TSCPA hosts accounting bootcamp; and more news from across the profession.

Continue Reading

Accounting

Tech news: Certinia announces spring release

Published

on


Certinia announces spring release; Intuit acquires tech and experts from fintech Deserve; Paystand launches feature to navigate tariffs; and other accounting tech news and updates.

Continue Reading

Trending