Knowledge

Welcome to the Knowledge Portal. You can browse, search or filter our publications, seminars and webinars, multimedia and collections of curated content from across our global network. Create an account and set your email alert preferences to receive the content relevant to you and your business, at your chosen frequency.

11 czerwca 2024 Peter Church DigiLinks

Australia – Allens AI Australian Law Benchmark

The last 24 months have seen generative AI advance in leaps and bounds, exemplified by remarkable developments in large language models (LLMs). Their new capabilities are having a significant impact on the way businesses operate, including the legal function. However, is generative AI any good when it comes to the law and, particularly, Australian law?

Our alliance partner in Australia, Allens, has developed the Allens AI Australian Law Benchmark to test the ability of LLMs to answer legal questions. They tested general-purpose implementations of market-leading LLMs, approximating how a lay user might try to answer legal questions using AI, instead of a human lawyer.

Key findings

The strongest overall performer was GPT-4, followed by Perplexity. The runners-up, LLaMa 2, Claude 2 and Gemini-1, performed similarly.
Even the best-performing LLMs tested were not consistently reliable when asked to answer legal questions. While these LLMs could have a practical role in assisting legal practitioners to summarise relatively well-understood areas of law, inconsistencies in performance means these outputs still need careful review by someone able to verify their accuracy and correctness.
For tasks that involve critical reasoning, none of the general tools can be relied upon to produce correct legal advice. They frequently produced answers that got the law wrong and/or missed the point of the question, while expressing their answers with falsely inflated confidence.
These models should not be used for Australian law legal advice without expert human supervision. There are real risks to using these tools to generate Australian law legal advice if you do not already know the answer.

The “infection” issue

One of the more interesting points in the benchmark is confirmation that the answers for smaller jurisdictions such as Australia are “infected” by legal analysis from larger jurisdictions with similar, but different, laws.

Despite being asked to answer from an Australian law perspective, many of the responses cited authorities from UK and EU law, or incorporated UK and EU law analysis that is not correct for Australian law.

In one notable example, Claude 2 was asked whether intellectual property rights could be used to prevent the re-use of a price index. It responded by citing “Relevant case law interpreting these laws (e.g., Navitaire Inc v Jetstar Airways Pty Ltd [2022] FCAFC 84)”. This is a fictitious case with remarkable similarities to the seminal English law case Navitaire Inc v EasyJet Airline Co. [2004] EWHC 1725 (Ch).

It appears that Claude 2 was seeking to “Australianise” the case name by:

changing the court citation from “EWHC” (referring to the High Court of England and Wales) to “FCAFC” (referring to the Federal Court of Australia); and
replacing the reference to the low-cost UK airline easyJet with the Australian budget airline Jetstar.

It is hard not to applaud Claude 2’s initiative, but creating fictitious cases is not a helpful feature in an electronic legal assistant. (In any event, it is also hard to see why the English law Navitaire decision – which focused on copyright in software – is relevant to this question).

Wider findings – Is the solution RAG?

Allens’ findings are broadly consistent with our LinksAI English law benchmark from October last year.

Since we published that report, we have benchmarked a number of other systems. Some of those new systems perform at a significantly higher level. One system we tested appeared capable of – for example – preparing a sensible first draft of a summary of the law in a particular area, albeit the summary still needs careful review to verify its accuracy, correctness and absence of fictitious citations. However, even that system struggled with questions involving critical reasoning or clause analysis.

There has been some discussion about whether these problems can be addressed through law tools specifically designed to answer legal questions – particularly those using retrieval-augmented generation (“RAG”). RAG works by taking a query, searching through a knowledge base of trusted materials, ranking that material and then using the LLM to synthesise a response.

This approach should deliver more reliable results as it more tightly constrains the materials the LLM works from and, importantly, allows the LLM to link back the source material so the reader can verify the conclusions of the LLM.

Whether this will eliminate hallucinations entirely remains to be seen. The recent study by Stanford University’s Human-Centred Artificial Intelligence unit found that specialised legal AI tools reduce the hallucination problem associated with general purpose LLMs, but are a long way from eliminating it entirely (here). However, Stanford’s findings have been contested by some of the providers of these tools (here).

The Allens AI Australian Law Benchmark can be found here.

Our lawyers are enthusiastic, committed people who relish the challenges and opportunities that they encounter every day.

Search for a lawyer by name or use one of the filters.

Our business team members are enthusiastic, committed people who relish the challenges and opportunities that they encounter every day.

Search for a business team member by name or use one of the filters.

Australia – Allens AI Australian Law Benchmark

Key findings

The “infection” issue

Wider findings – Is the solution RAG?

DigiLinks

Key Contacts

Latest DigiLinks posts

Our lawyers are enthusiastic, committed people who relish the challenges and opportunities that they encounter every day.

Search for a lawyer by name or use one of the filters.

Our business team members are enthusiastic, committed people who relish the challenges and opportunities that they encounter every day.

Search for a business team member by name or use one of the filters.

Australia – Allens AI Australian Law Benchmark

Key findings

The “infection” issue

Wider findings – Is the solution RAG?

DigiLinks

Key Contacts

Latest DigiLinks posts

You will need to log in or register to view the content

Register

Reset password

Log in