1
submitted 1 year ago* (last edited 1 year ago) by sisyphean@programming.dev to c/auai@programming.dev

πŸ‘‹ Hello everyone, welcome to our Weekly Discussion thread!

This week, we’re interested in your thoughts on AI safety: Is it an issue that you believe deserves significant attention, or is it just fearmongering motivated by financial interests?

I've created a poll to gauge your thoughts on these concerns. Please take a moment to select the AI safety issues you believe are most crucial:

VOTE HERE: πŸ—³οΈ https://strawpoll.com/e6Z287ApqnN

Here is a detailed explanation of the options:

  1. Misalignment between AI and human values: If an AI system's goals aren't perfectly aligned with human values, it could lead to unintended and potentially catastrophic consequences.

  2. Unintended Side-Effects: AI systems, especially those optimized to achieve a specific goal, might engage in harmful behavior that was not intended, often referred to as "instrumental convergence".

  3. Manipulation and Deception: AI could be used for manipulating information, deepfakes, or influencing behavior without consent, leading to erosion of trust and reality.

  4. AI Bias: AI models may perpetuate or amplify existing biases present in the data they're trained on, leading to unfair outcomes in various sectors like hiring, law enforcement, and lending.

  5. Security Concerns: As AI systems become more integrated into critical infrastructure, the potential for these systems to be exploited or misused increases.

  6. Economic and Social Impact: Automation powered by AI could lead to significant job displacement and increase inequality, causing major socioeconomic shifts.

  7. Lack of Transparency: AI systems, especially deep learning models, are often criticized as "black boxes," where it's difficult to understand the decision-making process.

  8. Autonomous Weapons: The misuse of AI in warfare could lead to lethal autonomous weapons, potentially causing harm on a massive scale.

  9. Monopoly and Power Concentration: Advanced AI capabilities could lead to an unequal distribution of power and resources if controlled by a select few entities.

  10. Dependence on AI: Over-reliance on AI systems could potentially make us vulnerable, especially if these systems fail or are compromised.

Please share your opinion here in the comments!

1
submitted 1 year ago* (last edited 1 year ago) by sisyphean@programming.dev to c/auai@programming.dev
1

cross-posted from: https://programming.dev/post/314158

Announcement

The bot I announced in this thread is now ready for a limited beta release.

You can see an example summary it wrote here.

How to Use AutoTLDR

  • Just mention it ("@" + "AutoTLDR") in a comment or post, and it will generate a summary for you.
  • If mentioned in a comment, it will try to summarize the parent comment, but if there is no parent comment, it will summarize the post itself.
  • If the parent comment contains a link, or if the post is a link post, it will summarize the content at that link.
  • If there is no link, it will summarize the text of the comment or post itself.
  • πŸ”’ If you include the #nobot hashtag in your profile, it will not summarize anything posted by you.

Beta limitations

How to try it

  • If you want to test the bot, write a long comment, or include a link in a comment in this thread, and then, in a reply comment, mention the bot.
  • Feel free to test it and try to break it in this thread. Please report any weird behavior you encounter in a PM to me (NOT the bot).
  • You can also use it for its designated purpose anywhere in the AUAI community.
2
submitted 1 year ago* (last edited 1 year ago) by sisyphean@programming.dev to c/auai@programming.dev

Announcement

The bot I announced in this thread is now ready for a limited beta release.

You can see an example summary it wrote here.

How to Use AutoTLDR

  • Just mention it ("@" + "AutoTLDR") in a comment or post, and it will generate a summary for you.
  • If mentioned in a comment, it will try to summarize the parent comment, but if there is no parent comment, it will summarize the post itself.
  • If the parent comment contains a link, or if the post is a link post, it will summarize the content at that link.
  • If there is no link, it will summarize the text of the comment or post itself.
  • πŸ”’ If you include the #nobot hashtag in your profile, it will not summarize anything posted by you.

Beta limitations

How to try it

  • If you want to test the bot, write a long comment, or include a link in a comment in this thread, and then, in a reply comment, mention the bot.
  • Feel free to test it and try to break it in this thread. Please report any weird behavior you encounter in a PM to me (NOT the bot).
  • You can also use it for its designated purpose anywhere in the AUAI community.
1
Understanding GPT tokenizers (simonwillison.net)
submitted 1 year ago* (last edited 1 year ago) by sisyphean@programming.dev to c/auai@programming.dev

This is an excellent overview of tokenization with many interesting examples. I also like Simon's small CLI tools; you can read about them at the end of the post.

As usual, I've asked GPT-4 to write a TL;DR and detailed notes for it.

Notice that it couldn't print the "davidjl" glitch token, and (probably because of its presence), the notes are also incomplete. At first I thought it was because the text of the article was longer than the context window, but the TL;DR contains details the notes don't so that probably wasn't the case.

I've still decided to copy the notes here because they are generally useful and also demonstrate this weird behavior.

TL;DR (by GPT-4 πŸ€–)

The article discusses the concept of tokenization in large language models like GPT-3/4, LLaMA, and PaLM. These models convert text into tokens (integers) and predict the next tokens. The author explains how English words are usually assigned a single token, while non-English languages often have less efficient tokenization. The article also explores "glitch tokens," which exhibit unusual behavior, and the necessity of counting tokens to ensure OpenAI's models' token limit is not exceeded. The author introduces a Python library called tiktoken and a command-line tool called ttok for this purpose. Understanding tokens can help make sense of how GPT tools generate text.

Notes (by GPT-4 πŸ€–)

Understanding GPT Tokenizers

  • Large language models like GPT-3/4, LLaMA, and PaLM operate in terms of tokens, which are integers representing text. They convert text into tokens and predict the next tokens.
  • OpenAI provides a Tokenizer tool for exploring how tokens work. The author has also built a tool as an Observable notebook.
  • The notebook can convert text to tokens, tokens to text, and run searches against the full token table.

Tokenization Examples

  • English words are usually assigned a single token. For example, "The" is token 464, " dog" is token 3290, and " eats" is token 25365.
  • Capitalization and leading spaces are important in tokenization. For instance, "The" with a capital T is token 464, but " the" with a leading space and a lowercase t is token 262.
  • Languages other than English often have less efficient tokenization. For example, the Spanish sentence "El perro come las manzanas" is encoded into seven tokens, while the English equivalent "The dog eats the apples" is encoded into five tokens.
  • Some languages may have single characters that encode to multiple tokens, such as certain Japanese characters.

Glitch Tokens and Token Counting

  • There are "glitch tokens" that exhibit unusual behavior. For example, token 23282β€”"djl"β€”is one such glitch token. It's speculated that this token refers to a Reddit user who posted incremented numbers hundreds of thousands of times, and this username ended up getting its own token in the training data.
  • OpenAI's models have a token limit, and it's sometimes necessary to count the number of tokens in a string before passing it to the API to ensure the limit is not exceeded. OpenAI provides a Python library called tiktoken for this purpose.
  • The author also introduces a command-line tool called ttok, which can count tokens in text and truncate text down to a specified number of tokens.

Token Generation

  • Understanding tokens can help make sense of how GPT tools generate text. For example, names not in the dictionary, like "Pelly", take multiple tokens, but "Captain Gulliver" outputs the token "Captain" as a single chunk.
19
This asshole fish (programming.dev)
36
GITar Hero (programming.dev)
1
submitted 1 year ago* (last edited 1 year ago) by sisyphean@programming.dev to c/auai@programming.dev

TL;DR (by GPT-4 πŸ€–):

Prompt Engineering, or In-Context Prompting, is a method used to guide Language Models (LLMs) towards desired outcomes without changing the model weights. The article discusses various techniques such as basic prompting, instruction prompting, self-consistency sampling, Chain-of-Thought (CoT) prompting, automatic prompt design, augmented language models, retrieval, programming language, and external APIs. The effectiveness of these techniques can vary significantly among models, necessitating extensive experimentation and heuristic approaches. The article emphasizes the importance of selecting diverse and relevant examples, giving precise instructions, and using external tools to enhance the model's reasoning skills and knowledge base.

Notes (by GPT-4 πŸ€–):

Prompt Engineering: An Overview

  • Introduction
    • Prompt Engineering, also known as In-Context Prompting, is a method to guide the behavior of Language Models (LLMs) towards desired outcomes without updating the model weights.
    • The effectiveness of prompt engineering methods can vary significantly among models, necessitating extensive experimentation and heuristic approaches.
    • This article focuses on prompt engineering for autoregressive language models, excluding Cloze tests, image generation, or multimodality models.
  • Basic Prompting
    • Zero-shot and few-shot learning are the two most basic approaches for prompting the model.
    • Zero-shot learning involves feeding the task text to the model and asking for results.
    • Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task.
  • Tips for Example Selection and Ordering
    • Examples should be chosen that are semantically similar to the test example.
    • The selection of examples should be diverse, relevant to the test sample, and in random order to avoid biases.
  • Instruction Prompting
    • Instruction prompting involves giving the model direct instructions, which can be more token-efficient than few-shot learning.
    • Models like InstructGPT are fine-tuned with high-quality tuples of (task instruction, input, ground truth output) to better understand user intention and follow instructions.
  • Self-Consistency Sampling
    • Self-consistency sampling involves sampling multiple outputs and selecting the best one out of these candidates.
    • The criteria for selecting the best candidate can vary from task to task.
  • Chain-of-Thought (CoT) Prompting
    • CoT prompting generates a sequence of short sentences to describe reasoning logics step by step, leading to the final answer.
    • CoT prompting can be either few-shot or zero-shot.
  • Automatic Prompt Design
    • Automatic Prompt Design involves treating prompts as trainable parameters and optimizing them directly on the embedding space via gradient descent.
  • Augmented Language Models
    • Augmented Language Models are models that have been enhanced with reasoning skills and the ability to use external tools.
  • Retrieval
    • Retrieval involves completing tasks that require latest knowledge after the model pretraining time cutoff or internal/private knowledge base.
    • Many methods for Open Domain Question Answering depend on first doing retrieval over a knowledge base and then incorporating the retrieved content as part of the prompt.
  • Programming Language and External APIs
    • Some models generate programming language statements to resolve natural language reasoning problems, offloading the solution step to a runtime such as a Python interpreter.
    • Other models are augmented with text-to-text API calls, guiding the model to generate API call requests and append the returned result to the text sequence.
1

cross-posted from: https://programming.dev/post/222613

Although I prefer the Pro Git book, it's clear that different resources are helpful to different people. For those looking to get an understanding of Git, I've linked to Git for Beginners: Zero to Hero πŸ™

The author of "Git for Beginners: Zero to Hero πŸ™" posted the following on Reddit:

Hey there folks!

I've rewritten the git tutorial. I've used over the years whenever newbies at work and friends come to me with complex questions but lack the git basics to actually learn.

After discussing my git shortcuts and aliases elsewhere and over DMs it was suggested to me that I share it here.

I hope it helps even a couple of y'all looking to either refresh, jumpstart or get a good grasp of how common git concepts relate to one another !

It goes without saying, that any and all feedback is welcome and appreciated πŸ‘

TL;DR: re-wrote a git tutorial that has helped friends and colleagues better grasp of git https://jdsalaro.com/blog/git-tutorial/

EDIT:

I've been a bit overwhelmed by the support and willingness to provide feedback, so I've enabled hypothes.is on https://jdsalaro.com for /u/NervousQuokka and anyone else wanting chime in. You can now highlight and comment snippets. ⚠️ Please join the feedback@jdsalaro group via this link https://hypothes.is/groups/BrRxenZW/feedback-jdsalaro so any highlights, comments, and notes are visible to me and stay nicely grouped. Using hypothes.is for this is an experiment for me, so let's see how it goes :)

https://old.reddit.com/r/learnprogramming/comments/14i14jv/rewrote_my_zero_to_hero_git_tutorial_and_was_told/

1

cross-posted from: https://programming.dev/post/216322

From the β€œAbout” section:

goblin.tools is a collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult.

Most tools will use AI technologies in the back-end to achieve their goals. Currently this includes OpenAI's models. As the tools and backend improve, the intent is to move to an open source alternative.

The AI models used are general purpose models, and so the accuracy of their output can vary. Nothing returned by any of the tools should be taken as a statement of truth, only guesswork. Please use your own knowledge and experience to judge whether the result you get is valid.

3

From the β€œAbout” section:

goblin.tools is a collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult.

Most tools will use AI technologies in the back-end to achieve their goals. Currently this includes OpenAI's models. As the tools and backend improve, the intent is to move to an open source alternative.

The AI models used are general purpose models, and so the accuracy of their output can vary. Nothing returned by any of the tools should be taken as a statement of truth, only guesswork. Please use your own knowledge and experience to judge whether the result you get is valid.

24
Fixed (programming.dev)
view more: β€Ή prev next β€Ί

sisyphean

joined 1 year ago
MODERATOR OF