• AI Research Insights
  • Posts
  • AI News: πŸš€ Meet ChatDoctor | LLMs Can Outperform Humans on Data Annotation | Meet ALOHA | Multimodal Language Models: The Future of Artificial Intelligence (AI)...

AI News: πŸš€ Meet ChatDoctor | LLMs Can Outperform Humans on Data Annotation | Meet ALOHA | Multimodal Language Models: The Future of Artificial Intelligence (AI)...

This newsletter brings AI research news that is much more technical than most resources but still digestible and applicable

Meet ChatDoctor: A medical chat model fine-tuned on LLaMA using medical domain knowledge. It collects data on around 700 diseases and generates 5K doctor-patient conversations to finetune the LLM. According to the authors, a blind evaluation was conducted between ChatDoctor and ChatGPT to objectively assess their medical abilities. In terms of recommending medications for various diseases, ChatDoctor exhibited a 91.25% accuracy rate, whereas ChatGPT achieved an accuracy rate of 87.5%.

LLMs Can Outperform Humans on Data Annotation: The University of Zurich researchers used 2,382 tweets to compare the performance of ChatGPT with crowd-workers and trained annotators for various annotation tasks. ChatGPT was found to outperform crowd-workers in relevance, stance, topics, and frames detection, with its zero-shot accuracy being higher than that of crowd-workers for four out of five tasks. The intercoder agreement of ChatGPT was also higher than that of both crowd-workers and trained annotators for all tasks. Additionally, the cost per annotation with ChatGPT was less than $0.003, making it twenty times cheaper than using MTurk. These findings demonstrate the potential of large language models in significantly improving the efficiency of text classification.

CelebV-Text: A Large-Scale Facial Text-Video Dataset. This paper introduces CelebV-Text, a new dataset designed to address the challenges in face-centric text-to-video generation. With 70,000 diverse and high-quality facial video clips paired with 20 texts generated using a semi-automatic strategy, CelebV-Text enables research on facial text-to-video generation tasks. The dataset is superior to others due to its high-quality videos, relevant texts, and precise descriptions of both static and dynamic attributes. The paper demonstrates CelebV-Text's effectiveness through comprehensive statistical analysis and self-evaluation, and provides a benchmark for evaluating facial text-to-video generation methods. All data and models are available to the public.

GoogleAI's Pix2Struct is now available in πŸ€— Transformers: It is one of the best document AI models out there, beating Donut by 9 points on DocVQA. The model is pretty simple: a Transformer (vision encoder, language decoder). No OCR is involved! The model is pre-trained in a self-supervised fashion by predicting HTML based on masked portions of web page images. It gets SOTA on 6 out of 9 benchmarks.

Meet ALOHA: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation. With a $20k budget, it is capable of teleoperating precise tasks such as threading a zip tie, dynamic tasks such as juggling a ping pong ball, and contact-rich tasks such as assembling the chain in the NIST board #2. ALOHA has two leader & two follower arms, and syncs the joint positions from leaders to followers at 50Hz. The user teleops by simply moving the leader robots. This takes 10 lines to implement, yet intuitive and responsive anywhere within the joint limits.

EVA-CLIP: The efficiency and effectiveness of contrastive language-image pre-training (CLIP) have led to its increased attention in various scenarios. A research group has proposed EVA-CLIP, which incorporates new techniques for representation learning, optimization, and augmentation. This has enabled EVA-CLIP to outperform previous CLIP models, achieving superior performance with the same number of parameters but significantly smaller training costs. The largest EVA-02-CLIP-E/14+ model with 5.0B-parameters achieved 82.0 zero-shot top-1 accuracy on ImageNet-1K val with only 9 billion seen samples. Additionally, the smaller EVA-02-CLIP-L/14+ model with 430 million parameters and 6 billion seen samples achieved 80.4 zero-shot top-1 accuracy on ImageNet-1K val.

Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers? For partnership, please feel to contact us through this form.