Search company, investor...

Founded Year



Series B | Alive

Total Raised


Last Raised

$32.68M | 4 yrs ago

About Amboss

Amboss operates a medical technology company. It provides a knowledge platform for medical students and clinicians, offering a library and a question bank for study and exam preparation, as well as a clinical reference tool for evidence-based decision-making in patient care, primarily serving the healthcare and education industries. The company was founded in 2013 and is based in Berlin, Germany.

Headquarters Location

Torstrasse 19

Berlin, 10119,


+49 (0) 30 – 5770221- 0



Latest Amboss News

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study

Jan 18, 2024

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study Authors of this article: 2Department of Ophthalmology, American University of Beirut Medical Center, Beirut, Lebanon Corresponding Author: Bliss Street Abstract Background: ChatGPT and language learning models have gained attention recently for their ability to answer questions on various examinations across various disciplines. The question of whether ChatGPT could be used to aid in medical education is yet to be answered, particularly in the field of ophthalmology. Objective: The aim of this study is to assess the ability of ChatGPT-3.5 (GPT-3.5) and ChatGPT-4.0 (GPT-4.0) to answer ophthalmology-related questions across different levels of ophthalmology training. Methods: Questions from the United States Medical Licensing Examination (USMLE) steps 1 (n=44), 2 (n=60), and 3 (n=28) were extracted from AMBOSS, and 248 questions (64 easy, 122 medium, and 62 difficult questions) were extracted from the book, Ophthalmology Board Review Q&A, for the Ophthalmic Knowledge Assessment Program and the Board of Ophthalmology (OB) Written Qualifying Examination (WQE). Questions were prompted identically and inputted to GPT-3.5 and GPT-4.0. Results: GPT-3.5 achieved a total of 55% (n=210) of correct answers, while GPT-4.0 achieved a total of 70% (n=270) of correct answers. GPT-3.5 answered 75% (n=33) of questions correctly in USMLE step 1, 73.33% (n=44) in USMLE step 2, 60.71% (n=17) in USMLE step 3, and 46.77% (n=116) in the OB-WQE. GPT-4.0 answered 70.45% (n=31) of questions correctly in USMLE step 1, 90.32% (n=56) in USMLE step 2, 96.43% (n=27) in USMLE step 3, and 62.90% (n=156) in the OB-WQE. GPT-3.5 performed poorer as examination levels advanced (P<.001), while GPT-4.0 performed better on USMLE steps 2 and 3 and worse on USMLE step 1 and the OB-WQE (P<.001). The coefficient of correlation (r) between ChatGPT answering correctly and human users answering correctly was 0.21 (P=.01) for GPT-3.5 as compared to –0.31 (P<.001) for GPT-4.0. GPT-3.5 performed similarly across difficulty levels, while GPT-4.0 performed more poorly with an increase in the difficulty level. Both GPT models performed significantly better on certain topics than on others. Conclusions: ChatGPT is far from being considered a part of mainstream medical education. Future models with higher accuracy are needed for the platform to be effective in medical education. JMIR Med Educ 2024;10:e50842 Principal Findings Our results indicate that GPT-4.0 is superior to GPT-3.5, and that GPT-3.5 has a below-average accuracy in answering questions correctly. The total proportion of correct answers for GPT-3.5 was 55% (n=210), which is considered a poor performance, while that of GPT-4.0 was 70% (n=270), which is an almost average performance [ 7 ]. Students typically must achieve 59%-60% of correct answers to pass, and students perform with an average of around 70%-75% on the aforementioned board examinations [ 7 ]. It is interesting to note that GPT-3.5’s performance decreased as examination levels increased. This is probably due to the more clinical nature of the examinations. This was not the case for GPT-4.0, which performed best on USMLE steps 2 and 3. This study investigates the correlation between ChatGPT-3.5 and -4.0 providing a correct answer and the percentage of human users who provided the answer correctly on AMBOSS. For GPT-3.5, a correlation coefficient of 0.21 (P=.01) was noted; whereas, this correlation coefficient was –0.31 (P<.001) for GPT-4.0. This implies that GPT-4.0 performed better on questions that fewer users answered correctly. Although our study is limited in that it did not divide the questions into categories such as diagnosis, treatment, basic knowledge, or surgical planning questions. Looking closely at the lens and cataract section in which the model failed (32% of correct answers for GPT-3.5), it was noted that all the correct answers were basic knowledge questions. Surprisingly, an analysis of incorrect answers showed that almost half of the incorrectly answered questions were also basic knowledge questions. For instance, in one of the questions, the model was unable to identify the collagen fiber type in cataract—a piece of information that is widely available on the internet. On the other hand, GPT-4.0 performed significantly better on basic knowledge questions. One may postulate that since GPT-4.0 was fed a larger database than was GPT-3.5, it has better abilities in answering basic knowledge questions than GPT-3.5. A study by Taloni et al [ 8 ] also noted a significant difference in performance between the 2 models in the cataract and anterior segment diseases categories. It is unclear why it performed so poorly in the lens and cataract section. It could be hypothesized that managing diseases of the lens and cataract may be mostly surgical. This may not have been fed into this language learning model. Furthermore, surgical management requires input from images and videos, which were excluded from our paper and may have caused the drastic difference in performance. Further studies with more questions are needed to answer this question. Table 2 outlines the percentage of correct answers based on the difficulty level on both models. GPT-4.0 performed poorer on questions with greater difficulties on both AMBOSS and OB-WQE questions, whereas this observation was not significant in GPT-3.5, indicating that it performed almost equally well across difficulty levels. Gilson et al [ 7 ] also reported a similar finding for GPT-3.5. Further studies are needed to explain those findings. This study also examined the proportion of correct answers based on the different topics. Both models performed significantly better on certain topics than others. This is a novel finding not reported in other studies assessing the performance of ChatGPT. It is interesting to further explore this association and why a model would perform on certain topics better than others. It could be hypothesized that questions on topics such as oculoplastic, which rely on surgical techniques and knowledge of aesthetics, may be more difficult for AI models to answer correctly than topics such as oncology and pathology, which rely more on clinical knowledge. Taloni et al [ 8 ] reported a better performance of ChatGPT on clinical rather than surgical cases. The moderate accuracy of ChatGPT-3.5 has been widely replicated in various studies. Gilson et al [ 7 ] found accuracies ranging between 42% and 64.4% in USMLE steps 1 and 2 examinations, numbers similar to those noted in this study [ 7 ]. The paper also records a decrease in the proportion of correct answers as difficulty level increases, which has been noted in this study as well. Another study by Huh [ 9 ] showed that ChatGPT’s performance was significantly lower than that of Korean medical students in a parasitology examination. A letter to the editor of the journal Resuscitation revealed that ChatGPT did not reach the passing threshold for the Life Support examination [ 10 ]. The cited studies indicate the moderate capabilities of ChatGPT in answering clinically related questions. More studies are needed to show how we can best optimize ChatGPT for medical education. Mihalache et al [ 11 ] assessed the performance of ChatGPT on the OKAP and found that it provided 46% correct answers, not unlike the proportion of OB-WQE questions correctly answered by GPT-3.5 in this study. All the aforementioned studies used ChatGPT-3.5 in their analysis. More recent studies have assessed the efficacy of ChatGPT-4.0. A study by Lim et al [ 12 ] assessed the performance of GPT-4.0 on myopia-related questions, and the model performed with 80.6% adequate responses, compared to 61.3% for GPT-3.5. Taloni et al [ 8 ] assessed the use of ChatGPT-4.0 and ChatGPT-3.5 in the American Academy of Ophthalmology’s self-assessment questions; their study found that GPT-4.0 (82.4% of correct answers) performed better than both humans (75.7% of correct answers) and GPT-3.5 (65.9% of correct answers). The study also assessed the performance of these models across various topics [ 8 ]. Similar to our results, Taloni et al [ 8 ] found that ChatGPT performed better on ocular oncology and pathology compared to topics such as strabismus and pediatric ophthalmology. To our knowledge, our study is among the first few to assess the abilities of GPT-4.0 in medical examinations across various levels of education and various board examinations. When reviewing the explanations provided by ChatGPT, it was noted that the model would randomly either explain the provided answer choice or not. It is particularly remarkable to read how it justified the wrong answer choices. More studies are needed to emphasize and assess the answer justifications of the model. Indeed, having solid explanations is essential for it to become a reliable medical education tool. Our study is unique in that it assesses the capabilities of ChatGPT in answering ophthalmology-related questions in contrast to other studies that assessed its ability to succeed in general examinations such as USMLE steps 1 and 2. Furthermore, this is the first study to assess the ability of ChatGPT to answer questions of a certain discipline across all its examination levels. Finally, this is among the first studies to compare GPT-4.0’s performance to GPT-3.5’s performance in medical examinations. ChatGPT can be a great add-on to mainstream resources to study for board examinations. There have been reports of using it to generate clinical vignettes and board examination–like questions, which can create more unique practice opportunities for students. Additionally, our study also assesses the accuracy of the 2 models on board examination questions related to ophthalmology. Students can input questions they need help with on the platform, and receive an answer and explanation by using the platform. If the student is not satisfied with the answer provided, or has further questions, he or she can respond to the model and receive a more personalized answer. This is crucial as it significantly decreases the time needed to study and also creates a tailored study experience for each student’s needs. However, ChatGPT needs further optimization before it can be considered a mainstream tool for medical education. The image feature was not present in GPT-3.5 and was introduced in GPT-4.0. This feature is available only on demand and is yet to be available to all users. Its accuracy and reliability are yet to be established for examination purposes. Many questions were excluded due to them containing images, which is a considerable limitation considering the visual nature of ophthalmology. Even in the text-only questions, ChatGPT had moderate accuracy in answering questions across different difficulties and levels. This study is, however, limited by the small number of questions, particularly in the USMLE steps, due to the absence of a large number of ophthalmology questions in the resources used to prepare for these examinations. More studies are needed, which input a larger number of questions. This study also does not assess the repeatability of ChatGPT’s answers; however, a study by Antaki et al [ 13 ] reported near-perfect repeatability. Conclusions Overall, this study suggests that ChatGPT has moderate accuracy in answering questions. Its accuracy decreases in nature as the examinations become more advanced and more clinical in nature. In its current state, ChatGPT does not seem to be the ideal medium for medical education and preparation for board examinations. Future models with more robust capabilities may soon become part of mainstream medical education. More studies are needed, which input a larger number of questions to verify the results of this study and attempt to find explanations for many of the intriguing findings. Acknowledgments We thank AMBOSS and Thieme Publishers for granting access to the questions for use in this present study. All authors declared that they had insufficient or no funding to support open access publication of this manuscript, including from affiliated organizations or institutions, funding agencies, or other organizations. JMIR Publications provided article processing fee (APF) support for the publication of this article. Conflicts of Interest References Gozalo-Brizuela R, Garrido-Merchan EC. ChatGPT is not all you need. A atate of the art review of large generative AI models. arXiv. Preprint posted online January 11, 2023. . [ CrossRef ] Castelvecchi D. Are ChatGPT and AlphaCode going to replace programmers? Nature. Dec 08, 2022 [ CrossRef ] [ Medline ] Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. Oct 05, 2023 [ CrossRef ] [ Medline ] Azaria A. ChatGPT usage and limitations. OSF Preprints. Preprint posted online December 27, 2022. [ CrossRef ] Powerful learning and clinical tools combined into one platform. AMBOSS. URL: [accessed 2023-03-05] Smith BT, Bottini AR. Graefes Arch Clin Exp Ophthalmol. Jul 15, 2021;259(8):2457-2458. [ CrossRef ] Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. Feb 08, 2023;9:e45312. [ ] [ CrossRef ] [ Medline ] Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, et al. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep. Oct 29, 2023;13(1):18562. [ ] [ CrossRef ] [ Medline ] Huh S. Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination? : a descriptive study. J Educ Eval Health Prof. 2023;20:1. [ ] [ CrossRef ] [ Medline ] Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation. Apr 2023;185:109732. [ CrossRef ] [ Medline ] Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. Jun 01, 2023;141(6):589-597. [ CrossRef ] [ Medline ] Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun C, Lam JSH, et al. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine. Sep 2023;95:104770. [ ] [ CrossRef ] [ Medline ] Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. Dec 2023;3(4):100324. [ ] [ CrossRef ] [ Medline ] ‎

Amboss Frequently Asked Questions (FAQ)

  • When was Amboss founded?

    Amboss was founded in 2013.

  • Where is Amboss's headquarters?

    Amboss's headquarters is located at Torstrasse 19, Berlin.

  • What is Amboss's latest funding round?

    Amboss's latest funding round is Series B.

  • How much did Amboss raise?

    Amboss raised a total of $32.68M.

  • Who are the investors of Amboss?

    Investors of Amboss include Partech Partners, Wellington Partners, Holtzbrinck Digital, Cherry Ventures, Target Hero and 3 more.

  • Who are Amboss's competitors?

    Competitors of Amboss include Tonic App and 3 more.


Compare Amboss to Competitors

Sermo Logo

Sermo is a company that focuses on providing a social network platform specifically for physicians, operating within the healthcare and technology sectors. The company's main service is to offer a platform where physicians can connect globally, share knowledge, solve patient cases, rate drugs, and participate in paid medical research studies. This platform primarily serves the healthcare industry. It was founded in 2000 and is based in New York, New York.


UpHill is making healthcare safer by delivering training and performance analysis software.

MedShr Logo

MedShr is a technology platform operating in the healthcare sector. The company provides a secure network for medical professionals to discover, discuss, and share clinical cases and medical images. Its primary customers are doctors, healthcare professionals, and medical students. It was founded in 2015 and is based in London, England.

Kemtai Logo

Kemtai is a company that specializes in computer vision technology within the fitness and healthcare sectors. The company offers a motion tracking exercise platform that provides real-time guidance and training feedback for physiotherapy and wellness exercises. This platform primarily serves the physiotherapy, corporate wellness, and fitness industries. It is based in Tel Aviv, Israel.

Axess Health

Axess Health is a technology company focused on the healthcare sector. It provides a free, secure communications hub that connects healthcare professionals across Africa, offering real-time medical news, tools, and continuing medical education (CME) resources. The company primarily serves the healthcare industry, including doctors, nurses, field workers, and supply-side professionals such as pharmaceutical companies, device manufacturers, and equipment suppliers. It is based in South Africa.

Future Cardia

Future Cardia develops a digital health insertable cardiac device. It provides cardiac monitoring facilities. It offers insertable cardiac monitors, data-driven therapy, monitoring solution, and more. Future Cardia was formerly known as Oracle Health. It was founded in 2019 and is based in Houston, Texas.


CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.