insideBIGDATA AI News Briefs BULLETIN BOARD for Q1 2024

Welcome insideBIGDATA AI News Briefs Bulletin Board, our timely new feature bringing you the latest industry insights and perspectives surrounding the field of AI including deep learning, large language models, generative AI, and transformers. We’re working tirelessly to dig up the most timely and curious tidbits underlying the day’s most popular technologies. We know this field is advancing rapidly and we want to bring you a regular resource to keep you informed and state-of-the-art. The news bites are constantly being added in reverse date order (most recent on top). With our bulletin board you can check back often to see what’s happening in our rapidly accelerating industry. Click HERE to check out previous “AI News Briefs” round-ups.

[1/2/2024] GitHub repo highlight: “Large Language Model Course” – an comprehensive LLM course on GitHub paves the way for expertise in LLM technology.

[1/2/2024] Sam Altman and Jony Ive recruit iPhone design chief to build new AI device. Legendary designer Jony Ive, known for his iconic work at Apple, and Sam Altman are collaborating on a new artificial intelligence hardware project, enlisting former Apple executive Tang Tan to work at Ive’s design firm, LoveFrom.

[1/2/2024] Chegg Experiencing “Death by LLM”!?! Chegg began in 2005 as a disruptor, bringing online learning tools to students and transforming the landscape of education. But since the company stock’s (NYSE: CHGG) peak in 2021, Chegg has taken a significant nose dive of more than 90% while facing competition with the widely accessible LLMs, e.g. ChatGPT that came out on November 30, 2022. In August 2023, Chegg announced a partnership with Scale AI to transform their data into a dynamic learning experience for students after already collaborating with OpenAI on Cheggmate. A recent Harvard Business Review outtake highlights the potential value that Chegg’s specialized AI learning assistants may bring to a student’s learning experience by using feedback loops; instituting continuous model improvement; and training the model on proprietary datasets. The question remains however, can Chegg effectively associate their user data with AI to reclaim lost competitive ground and take advantage of new revenue streams, or is it fighting a losing battle against the rapidly evolving GenAI ecosystem? Personally, I have no love lost with Chegg, as I’ve discovered a number of of my Intro to Data Science students cheating on homework assignments and exams by accessing my coursework uploaded to Chegg.

[1/2/2024] AI research paper highlight: “Gemini: A Family of Highly Capable Multimodal Models,” the paper behind the new Google Gemini model release. The main problem addressed by Gemini is the challenge of creating models that can effectively understand and process multiple modalities (text, image, audio, and video) while also delivering advanced reasoning and understanding in each individual domain.

[1/2/2024] AI research paper highlight: “Generative Multimodal Models are In-Context Learners.” This research demonstrates that large multimodal models can enhance their task-agnostic in-context learning capabilities through effective scaling-up. The primary problem addressed is the struggle of multimodal systems to mimic the human ability to easily solve multimodal tasks in context – with only a few demonstrations or simple instructions. Emu2 is proposed, a new 37B generative multimodal model, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 consists of a visual encoder/decoder, and a multimodal transformer. Images are tokenized with the visual encoder to a continuous embedding space, interleaved with text tokens for autoregressive modeling. Emu2 is initially pretrained only on the captioning task with both image-text and video-text paired datasets. Emu2’s visual decoder is initialized from SDXL-base, and can be considered a visual detokenizer through a diffusion model. VAE is kept static while the weights of a diffusion U-Net are updated. Emu-chat is derived from Emu by fine-tuning the model with conversational data, and Emu-gen is fine-tuned with complex compositional generation tasks. Results of the research suggests that Emu2 achieves state-of-the-art few-shot performance on multiple visual question-answering datasets and demonstrates a performance improvement with an increase in the number of examples in context. Emu2 also learns to follow visual prompting in context, showcasing strong multimodal reasoning capabilities for tasks in the wild. When instruction-tuned to follow specific instructions, Emu2 further achieves new benchmarks on challenging tasks such as question answering for large multimodal models and open-ended subject-driven generation.