overview for sisyphean

1

🗓️ Weekly Discussion & Poll: AI Safety (programming.dev)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

0 comments fedilink

👋 Hello everyone, welcome to our Weekly Discussion thread!

This week, we’re interested in your thoughts on AI safety: Is it an issue that you believe deserves significant attention, or is it just fearmongering motivated by financial interests?

I've created a poll to gauge your thoughts on these concerns. Please take a moment to select the AI safety issues you believe are most crucial:

VOTE HERE: 🗳️ https://strawpoll.com/e6Z287ApqnN

Here is a detailed explanation of the options:

Misalignment between AI and human values: If an AI system's goals aren't perfectly aligned with human values, it could lead to unintended and potentially catastrophic consequences.
Unintended Side-Effects: AI systems, especially those optimized to achieve a specific goal, might engage in harmful behavior that was not intended, often referred to as "instrumental convergence".
Manipulation and Deception: AI could be used for manipulating information, deepfakes, or influencing behavior without consent, leading to erosion of trust and reality.
AI Bias: AI models may perpetuate or amplify existing biases present in the data they're trained on, leading to unfair outcomes in various sectors like hiring, law enforcement, and lending.
Security Concerns: As AI systems become more integrated into critical infrastructure, the potential for these systems to be exploited or misused increases.
Economic and Social Impact: Automation powered by AI could lead to significant job displacement and increase inequality, causing major socioeconomic shifts.
Lack of Transparency: AI systems, especially deep learning models, are often criticized as "black boxes," where it's difficult to understand the decision-making process.
Autonomous Weapons: The misuse of AI in warfare could lead to lethal autonomous weapons, potentially causing harm on a massive scale.
Monopoly and Power Concentration: Advanced AI capabilities could lead to an unequal distribution of power and resources if controlled by a select few entities.
Dependence on AI: Over-reliance on AI systems could potentially make us vulnerable, especially if these systems fail or are compromised.

Please share your opinion here in the comments!

1

What is Langchain and why should I care as a developer? (logankilpatrick.medium.com)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

0 comments fedilink

1

Beta version of AutoTLDR bot for Lemmy released (powered by GPT-3.5) (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/meta@programming.dev

0 comments fedilink

cross-posted from: https://programming.dev/post/314158

Announcement

The bot I announced in this thread is now ready for a limited beta release.

You can see an example summary it wrote here.

How to Use AutoTLDR

Just mention it ("@" + "AutoTLDR") in a comment or post, and it will generate a summary for you.

If mentioned in a comment, it will try to summarize the parent comment, but if there is no parent comment, it will summarize the post itself.

If the parent comment contains a link, or if the post is a link post, it will summarize the content at that link.

If there is no link, it will summarize the text of the comment or post itself.

🔒 If you include the #nobot hashtag in your profile, it will not summarize anything posted by you.

Beta limitations

The bot only works in the !auai@programming.dev community.

It is limited to 100 summaries per day.

How to try it

If you want to test the bot, write a long comment, or include a link in a comment in this thread, and then, in a reply comment, mention the bot.

Feel free to test it and try to break it in this thread. Please report any weird behavior you encounter in a PM to me (NOT the bot).

You can also use it for its designated purpose anywhere in the AUAI community.

2

Beta version of AutoTLDR bot for Lemmy released (powered by GPT-3.5) (programming.dev)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

6 comments fedilink

Announcement

The bot I announced in this thread is now ready for a limited beta release.

You can see an example summary it wrote here.

How to Use AutoTLDR

Just mention it ("@" + "AutoTLDR") in a comment or post, and it will generate a summary for you.
If mentioned in a comment, it will try to summarize the parent comment, but if there is no parent comment, it will summarize the post itself.
If the parent comment contains a link, or if the post is a link post, it will summarize the content at that link.
If there is no link, it will summarize the text of the comment or post itself.
🔒 If you include the #nobot hashtag in your profile, it will not summarize anything posted by you.

Beta limitations

The bot only works in the !auai@programming.dev community.
It is limited to 100 summaries per day.

How to try it

If you want to test the bot, write a long comment, or include a link in a comment in this thread, and then, in a reply comment, mention the bot.
Feel free to test it and try to break it in this thread. Please report any weird behavior you encounter in a PM to me (NOT the bot).
You can also use it for its designated purpose anywhere in the AUAI community.

1

Understanding GPT tokenizers (simonwillison.net)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

0 comments fedilink

This is an excellent overview of tokenization with many interesting examples. I also like Simon's small CLI tools; you can read about them at the end of the post.

As usual, I've asked GPT-4 to write a TL;DR and detailed notes for it.

Notice that it couldn't print the "davidjl" glitch token, and (probably because of its presence), the notes are also incomplete. At first I thought it was because the text of the article was longer than the context window, but the TL;DR contains details the notes don't so that probably wasn't the case.

I've still decided to copy the notes here because they are generally useful and also demonstrate this weird behavior.

TL;DR (by GPT-4 🤖)

The article discusses the concept of tokenization in large language models like GPT-3/4, LLaMA, and PaLM. These models convert text into tokens (integers) and predict the next tokens. The author explains how English words are usually assigned a single token, while non-English languages often have less efficient tokenization. The article also explores "glitch tokens," which exhibit unusual behavior, and the necessity of counting tokens to ensure OpenAI's models' token limit is not exceeded. The author introduces a Python library called tiktoken and a command-line tool called ttok for this purpose. Understanding tokens can help make sense of how GPT tools generate text.

Notes (by GPT-4 🤖)

Understanding GPT Tokenizers

Large language models like GPT-3/4, LLaMA, and PaLM operate in terms of tokens, which are integers representing text. They convert text into tokens and predict the next tokens.
OpenAI provides a Tokenizer tool for exploring how tokens work. The author has also built a tool as an Observable notebook.
The notebook can convert text to tokens, tokens to text, and run searches against the full token table.

Tokenization Examples

English words are usually assigned a single token. For example, "The" is token 464, " dog" is token 3290, and " eats" is token 25365.
Capitalization and leading spaces are important in tokenization. For instance, "The" with a capital T is token 464, but " the" with a leading space and a lowercase t is token 262.
Languages other than English often have less efficient tokenization. For example, the Spanish sentence "El perro come las manzanas" is encoded into seven tokens, while the English equivalent "The dog eats the apples" is encoded into five tokens.
Some languages may have single characters that encode to multiple tokens, such as certain Japanese characters.

Glitch Tokens and Token Counting

There are "glitch tokens" that exhibit unusual behavior. For example, token 23282—"djl"—is one such glitch token. It's speculated that this token refers to a Reddit user who posted incremented numbers hundreds of thousands of times, and this username ended up getting its own token in the training data.
OpenAI's models have a token limit, and it's sometimes necessary to count the number of tokens in a string before passing it to the API to ensure the limit is not exceeded. OpenAI provides a Python library called tiktoken for this purpose.
The author also introduces a command-line tool called ttok, which can count tokens in text and truncate text down to a specified number of tokens.

Token Generation

Understanding tokens can help make sense of how GPT tools generate text. For example, names not in the dictionary, like "Pelly", take multiple tokens, but "Captain Gulliver" outputs the token "Captain" as a single chunk.

19

This asshole fish (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/programmer_humor@programming.dev

1 comments fedilink

36

GITar Hero (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/programmer_humor@programming.dev

3 comments fedilink

1

Beyond the Hype: An In-Depth, No-Nonsense Guide to Prompt Engineering (lilianweng.github.io)

submitted 2 years ago* (last edited 2 years ago) by sisyphean@programming.dev to c/auai@programming.dev

0 comments fedilink

TL;DR (by GPT-4 🤖):

Prompt Engineering, or In-Context Prompting, is a method used to guide Language Models (LLMs) towards desired outcomes without changing the model weights. The article discusses various techniques such as basic prompting, instruction prompting, self-consistency sampling, Chain-of-Thought (CoT) prompting, automatic prompt design, augmented language models, retrieval, programming language, and external APIs. The effectiveness of these techniques can vary significantly among models, necessitating extensive experimentation and heuristic approaches. The article emphasizes the importance of selecting diverse and relevant examples, giving precise instructions, and using external tools to enhance the model's reasoning skills and knowledge base.

Notes (by GPT-4 🤖):

Prompt Engineering: An Overview

Introduction
- Prompt Engineering, also known as In-Context Prompting, is a method to guide the behavior of Language Models (LLMs) towards desired outcomes without updating the model weights.
- The effectiveness of prompt engineering methods can vary significantly among models, necessitating extensive experimentation and heuristic approaches.
- This article focuses on prompt engineering for autoregressive language models, excluding Cloze tests, image generation, or multimodality models.
Basic Prompting
- Zero-shot and few-shot learning are the two most basic approaches for prompting the model.
- Zero-shot learning involves feeding the task text to the model and asking for results.
- Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task.
Tips for Example Selection and Ordering
- Examples should be chosen that are semantically similar to the test example.
- The selection of examples should be diverse, relevant to the test sample, and in random order to avoid biases.
Instruction Prompting
- Instruction prompting involves giving the model direct instructions, which can be more token-efficient than few-shot learning.
- Models like InstructGPT are fine-tuned with high-quality tuples of (task instruction, input, ground truth output) to better understand user intention and follow instructions.
Self-Consistency Sampling
- Self-consistency sampling involves sampling multiple outputs and selecting the best one out of these candidates.
- The criteria for selecting the best candidate can vary from task to task.
Chain-of-Thought (CoT) Prompting
- CoT prompting generates a sequence of short sentences to describe reasoning logics step by step, leading to the final answer.
- CoT prompting can be either few-shot or zero-shot.
Automatic Prompt Design
- Automatic Prompt Design involves treating prompts as trainable parameters and optimizing them directly on the embedding space via gradient descent.
Augmented Language Models
- Augmented Language Models are models that have been enhanced with reasoning skills and the ability to use external tools.
Retrieval
- Retrieval involves completing tasks that require latest knowledge after the model pretraining time cutoff or internal/private knowledge base.
- Many methods for Open Domain Question Answering depend on first doing retrieval over a knowledge base and then incorporating the retrieved content as part of the prompt.
Programming Language and External APIs
- Some models generate programming language statements to resolve natural language reasoning problems, offloading the solution step to a runtime such as a Python interpreter.
- Other models are augmented with text-to-text API calls, guiding the model to generate API call requests and append the returned result to the text sequence.

1

Git for Beginners: Zero to Hero 🐙 (jdsalaro.com)

submitted 2 years ago by sisyphean@programming.dev to c/git@programming.dev

0 comments fedilink

cross-posted from: https://programming.dev/post/222613

Although I prefer the Pro Git book, it's clear that different resources are helpful to different people. For those looking to get an understanding of Git, I've linked to Git for Beginners: Zero to Hero 🐙

The author of "Git for Beginners: Zero to Hero 🐙" posted the following on Reddit:

Hey there folks!

I've rewritten the git tutorial. I've used over the years whenever newbies at work and friends come to me with complex questions but lack the git basics to actually learn.

After discussing my git shortcuts and aliases elsewhere and over DMs it was suggested to me that I share it here.

I hope it helps even a couple of y'all looking to either refresh, jumpstart or get a good grasp of how common git concepts relate to one another !

It goes without saying, that any and all feedback is welcome and appreciated 👍

TL;DR: re-wrote a git tutorial that has helped friends and colleagues better grasp of git https://jdsalaro.com/blog/git-tutorial/

EDIT:

I've been a bit overwhelmed by the support and willingness to provide feedback, so I've enabled hypothes.is on https://jdsalaro.com for /u/NervousQuokka and anyone else wanting chime in. You can now highlight and comment snippets. ⚠️ Please join the feedback@jdsalaro group via this link https://hypothes.is/groups/BrRxenZW/feedback-jdsalaro so any highlights, comments, and notes are visible to me and stay nicely grouped. Using hypothes.is for this is an experiment for me, so let's see how it goes :)

https://old.reddit.com/r/learnprogramming/comments/14i14jv/rewrote_my_zero_to_hero_git_tutorial_and_was_told/

1

GoblinTools - small AI-based tools for everyday tasks (goblin.tools)

submitted 2 years ago by sisyphean@programming.dev to c/autism@lemmy.world

0 comments fedilink

cross-posted from: https://programming.dev/post/216322

From the “About” section:

goblin.tools is a collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult.

Most tools will use AI technologies in the back-end to achieve their goals. Currently this includes OpenAI's models. As the tools and backend improve, the intent is to move to an open source alternative.

The AI models used are general purpose models, and so the accuracy of their output can vary. Nothing returned by any of the tools should be taken as a statement of truth, only guesswork. Please use your own knowledge and experience to judge whether the result you get is valid.

3

GoblinTools - small AI-based tools for everyday tasks (goblin.tools)

submitted 2 years ago by sisyphean@programming.dev to c/auai@programming.dev

1 comments fedilink

From the “About” section:

goblin.tools is a collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult.

Most tools will use AI technologies in the back-end to achieve their goals. Currently this includes OpenAI's models. As the tools and backend improve, the intent is to move to an open source alternative.

The AI models used are general purpose models, and so the accuracy of their output can vary. Nothing returned by any of the tools should be taken as a statement of truth, only guesswork. Please use your own knowledge and experience to judge whether the result you get is valid.

24

Fixed (programming.dev)

submitted 2 years ago by sisyphean@programming.dev to c/programmer_humor@programming.dev

11 comments fedilink

A fully self-contained natively compiled C# Hello World, including GC and everything can be as small as ~440 kB in c/programming@programming.dev

[–] sisyphean@programming.dev 0 points 2 years ago (1 children)

This is pretty awesome and it shows how far .NET has come in recent years.

What apps do y’all use? in c/adhd@lemmy.world

[–] sisyphean@programming.dev 1 points 2 years ago

Due for the iPhone is excellent. It's a reminder app that nags you every five minutes until you get The Thing™ done. Before I started using it, I had a problem with forgetting reminders once they appeared. This never happens anymore and I actually manage to get some things done!

Let's make a list of our favorite CLI utilities. in c/commandline@programming.dev

[–] sisyphean@programming.dev 1 points 2 years ago

I really like jless. You can pipe the JSON output of a cURL command into it and it displays it in a really nice, easy to read way with collapsible arrays and objects.