Programming

IEEE Specctrum Announces Top Programming Languages of 2023: Python and SQL (ieee.org) 102

Last week IEEE Spectrum released its 10th annual rankings of the Top Programming Languages. It choose a top language for each of three categories: actively used among typical IEEE members and working software engineers, in demand by employers, or "in the zeitgeist".

The results? This year, Python doesn't just remain No. 1 in our general "Spectrum" ranking — which is weighted to reflect the interests of the typical IEEE member — but it widens its lead.

Python's increased dominance appears to be largely at the expense of smaller, more specialized, languages. It has become the jack-of-all-trades language — and the master of some, such as AI, where powerful and extensive libraries make it ubiquitous. And although Moore's Law is winding down for high-end computing, low-end microcontrollers are still benefiting from performance gains, which means there's now enough computing power available on a US $0.70 CPU to make Python a contender in embedded development, despite the overhead of an interpreter. Python also looks to be solidifying its position for the long term: Many children and teens now program their first game or blink their first LED using Python. They can then move seamlessly into more advanced domains, and even get a job, with the same language.

But Python alone does not make a career. In our "Jobs" ranking, it is SQL that shines at No. 1. Ironically though, you're very unlikely to get a job as a pure SQL programmer. Instead, employers love, love, love, seeing SQL skills in tandem with some other language such as Java or C++. With today's distributed architectures, a lot of business-critical data live in SQL databases...

But don't let Python and SQL's rankings fool you: Programming is still far from becoming a monoculture. Java and the various C-like languages outweigh Python in their combined popularity, especially for high-performance or resource-sensitive tasks where that interpreter overhead of Python's is still too costly (although there are a number of attempts to make Python more competitive on that front). And there are software ecologies that are resistant to being absorbed into Python for other reasons.

The article cites the statistical analysis/visualization language R, as well as Fortran and Cobol, as languages that are hard to port code from or that have accumulated large already-validated codebases. But Python also remains at #1 in their third "Trending" category — with Java in second there and on the general "IEEE Spectrum" list.

JavaScript appears below Python and Java on all three lists. Java is immediately below them on the Trending and "Jobs" list, but two positions further down on the general "Spectrum" list (below C++ and C).

The metrics used for the calculation include the number of hits on Google, recent questions on Stack Overflow, tags on Discord, mentions in IEEE's library of journal articles and its CareerBuilder job site, and language use in starred GitHub repositories and number of new programming books.
Google

Google Launches BigQuery Studio, a New Way To Work With Data (techcrunch.com) 9

An anonymous reader quotes a report from TechCrunch: Companies increasingly see the value in mining their data for deeper insights. According to a NewVantage survey, 97.6% of major worldwide organizations are focusing investments into big data and AI. But challenges stand in the way of executing big data analytics. One recent poll found that 65% of organizations feel they have "too much" data to analyze. Google's proposed solution is BigQuery Studio, a new service within BigQuery, its fully managed serverless data warehouse, that provides a single experience to edit programming languages including SQL, Python and Spark to run analytics and machine learning workloads at "petabyte scale." BigQuery Studio is available in preview as of this week.

"BigQuery Studio is a new experience that really puts people who are working on data on the one side and people working on AI on the other side in a common environment," Gerrit Kazmaier, VP and GM of data and analytics at Google, told TechCrunch in a phone interview. "It basically provides access to all of the services that those people need to work -- there's an element of simplification on the user experience side." BigQuery Studio is designed to enable users to discover, explore, analyze and predict data. Users can start in a programming notebook to validate and prep data, then open that notebook in other services, including Vertex AI, Google's managed machine learning platform, to continue their work with more specialized AI infrastructure and tooling.

With BigQuery Studio, teams can directly access data wherever they're working, Kazmaier says. And they have added controls for "enterprise-level" governance, regulation and compliance. "[BigQuery Studio shows] how data is being generated to how it's being processed and how it's being used in AI models, which sounds technical, but it's really important," he added. "You can push down code for machine learning models directly into BigQuery as infrastructure, and that means that you can evaluate it at scale."

Programming

Creators of Python, Java, TypeScript, and SmallTalk Will Make a Joint Appearance for Charity (pydata.org) 45

The creators of four programming languages will appear together onstage for a historic conversation on September 19th.

- Adele Goldberg — Smalltalk
- Guido Van Rossum — Python
- Anders Hejlsberg — Turbo Pascal, C#, TypeScript
- James Gosling — Java

The announcement describes it as "a conversation about programming language design." The charity event brings together this unique group of computer science pioneers, unlike any event held before. These great minds come together for what will surely be a fantastic night of discussion as the panel delves into the past and future of programming language creation.
It's a fundraiser for two groups. NumFOCUS is a nonprofit charity sponsoring nearly all the major tools in the Python data science stack (including jupyter, numpy, pandas, and matplotlib), and it's also the group behind PyData conferences on open source data tools. And the Last Mile Education Fund offers financial support for low-income underrepresented students. It's being billed as the "inaugural charity event" of PyData Seattle.

This happened once before in 2019, when Puget Sound Programming Python arranged a four-way discussion with Python creator Guido van Rossum, Java creator James Gosling, Perl creator Larry Wall, and Anders Hejlsberg (Turbo Pascal, C#, TypeScript). They held a 90-minute discussion about "language design, the universe, and everything" as a benefit for CSforALL (a group promoting computer science classes at every grade level). During that discussion Gosling shared how Java "started out as kind of 'Do a better C', and it got out of control. The rest of the project really ended up just providing the context." And Anders Hejlsberg told the audience that TypeScript was inspired by massive "write-only" JavaScript code bases.

In their discussion on variable typing and its use in IDEs, Gosling mocked what he called the "real men use vi" mentality, leading to a lively back and forth. Perl's Larry Wall later acknowledged the importance of types and the careful consideration that went into implementing them for Perl 6, but also shared his unique perspective as a long-time designer of programming languages. "I think IDEs make language developers lazy."

At the end of the event, they all agreed that the most rewarding part of language design was the people — the excitement, the gratitude, and to see that community helping others in its community.
AI

Meta Releases Code Llama, a Code-Generating AI Model (techcrunch.com) 20

Meta, intent on making a splash in a generative AI space rife with competition, is on something of an open source tear. From a report: Following the release of AI models for generating text, translating languages and creating audio, the company today open sourced Code Llama, a machine learning system that can generate and explain code in natural language -- specifically English. Akin to GitHub Copilot and Amazon CodeWhisperer, as well as open source AI-powered code generators like StarCoder, StableCode and PolyCoder, Code Llama can complete code and debug existing code across a range of programming languages, including Python, C++, Java, PHP, Typescript, C# and Bash.

"At Meta, we believe that AI models, but large language models for coding in particular, benefit most from an open approach, both in terms of innovation and safety," Meta wrote in a blog post shared with TechCrunch. "Publicly available, code-specific models can facilitate the development of new technologies that improve peoples' lives. By releasing code models like Code Llama, the entire community can evaluate their capabilities, identify issues and fix vulnerabilities." Code Llama, which is available in several flavors, including a version optimized for Python and a version fine-tuned to understand instructions (e.g. "Write me a function that outputs the fibonacci sequence"), is based on the Llama 2 text-generating model that Meta open sourced earlier this month. While Llama 2 could generate code, it wasn't necessarily good code -- certainly not up to the quality a purpose-built model like Copilot could produce.

Microsoft

Microsoft Announces Python In Excel 92

theodp writes: On Tuesday, Microsoft announced the Public Preview of Python in Excel, which "runs securely on the Microsoft Cloud".

From the Home Office in Redmond: "Python is one of the most popular programming languages today, loved by businesses and students alike and Excel is an essential tool to organize, manipulate and analyze all kinds of data. But, until now, there hasn't been an easy way to make those two worlds work together. Today, we are excited to introduce the Public Preview of Python in Excel -- making it possible to integrate Python and Excel analytics within the same Excel grid for uninterrupted workflow. Python in Excel combines Python's powerful data analysis and visualization libraries with Excel's features you know and love. You can manipulate and explore data in Excel using Python plots and libraries, and then use Excel's formulas, charts and PivotTables to further refine your insights...We're partnering with Anaconda, a leading enterprise grade Python repository used by tens of millions of data practitioners worldwide. Python in Excel leverages Anaconda Distribution for Python running in Azure, which includes the most popular Python libraries such as pandas for data manipulation, statsmodels for advanced statistical modeling, and Matplotlib and seaborn for data visualization....While in Preview, Python in Excel will be included with your Microsoft 365 subscription. After the Preview, some functionality will be restricted without a paid license."

Python creator Guido van Rossum, now a Microsoft Distinguished Engineer, helped define the architecture for Python in Excel and had this to say: "I'm excited that this excellent, tight integration of Python and Excel is now seeing the light of day. I expect that both communities will find interesting new uses in this collaboration, amplifying each partner's abilities. When I joined Microsoft three years ago, I would not have dreamed this would be possible. The Excel team excels!"
Google

Google Launches Project IDX, a New AI-Enabled Browser-Based Development Environment (techcrunch.com) 17

An anonymous reader quotes a report from TechCrunch: Google today announced the launch of Project IDX, its foray into offering an AI-enabled browser-based development environment for building full-stack web and multiplatform apps. It currently supports frameworks like Angular, Flutter, Next.js, React, Svelte and Vue, and languages like JavaScript and Dart, with support for Python, Go and others in the works. Google did not build a new IDE (integrated development environment) when it created IDX. Instead, it is using Visual Studio Code -- Open Source as the basis of its project. This surely allowed the team to focus on the integration with Codey, Google's PaLM 2-based foundation model for programming tasks. Thanks to Codey, IDX supports smart code completion, a ChatGPT/Bard-like chatbot that can help developers with general coding questions as well as those related specifically to the code you are working on (including the ability to explain it) and the ability to add contextual code actions like "add comments."

"We spend a lot of time writing code, and recent advances in AI have created big opportunities to make that time more productive," the IDX team explains in today's announcement. "With Project IDX, we're exploring how Google's innovations in AI -- including the Codey and PaLM 2 models powering Studio Bot in Android Studio, Duet in Google Cloud and more -- can help you not only write code faster, but also write higher-quality code." As a cloud-based IDE, it's no surprise that Project IDX integrates with Google's own Firebase Hosting (and Google Cloud Functions) and allows developers to bring in existing code from the GitHub repository. Every workspace has access to a Linux-based VM (virtual machine) and, soon, embedded Android and iOS simulators right in the browser.

Programming

Should a Variable's Type Come After Its Name? (benhoyt.com) 321

Canonical engineering manager Ben Hoyt believes that a variable's name is more important than its type, so "the name should be more prominent and come first in declarations." In many popular programming languages, including C, C++, Java, and C#, when you define a field or variable, you write the type before the name. For example (in C++):

// Struct definition
struct person {
std::string name;
std::string email;
int age;
};


In other languages, including Go, Rust, TypeScript, and Python (with type hints), you write the name before the type. For example (in Go):

// Struct definition
type Person struct {
Name string
Email string
Age int
}

There's a nice answer in the Go FAQ about why Go chose this order: "Why are declarations backwards?". It starts with "they're only backwards if you're used to C", which is a good point — name-before-type has a long history in languages like Pascal. In fact, Go's type declaration syntax (and packages) were directly inspired by Pascal.

The FAQ goes on to point out that parsing is simpler with name-before-type, and declaring multiple variables is less error-prone than in C. In C, the following declares x to be a pointer, but (surprisingly at first!) y to be a normal integer:

int* x, y;

Whereas the equivalent in Go does what you'd expect, declaring both to be pointers:

var x, y *int

The Go blog even has an in-depth article by Rob Pike on Go's Declaration Syntax, which describes more of the advantages of Go's syntax over C's, particularly with arrays and function pointers.

Oddly, the article only hints at what I think is the more important reason to prefer name-before-type for everyday programming: it's clearer.

Hoyt argues a variable's name has more meaning (semantically) — pointing out dynamically-typed languages like Python and Ruby don't even need types, and that languages like Java, Go, C++ and C# now include type inference.

"I think the takeaway is this: we can't change the past, but if you're creating a new language, please put names before types!"
Python

Python's Steering Council Plans to Make Its 'Global Interpreter Lock' Optional (python.org) 21

Python's Global Interpreter Lock "allows only one thread to hold the control of the Python interpreter," according to the tutorial site Real Python. (They add, "it can be a performance bottleneck in CPU-bound and multi-threaded code.")

Friday the Python Steering Council "announced its intent to accept PEP 703 (Making the Global Interpreter Lock Optional in CPython), with initial support possibly showing up in the 3.13 release," reports LWN.net.

From the Steering Council's announcement: It's clear that the overall sentiment is positive, both for the general idea and for PEP 703 specifically. The Steering Council is also largely positive on both. We intend to accept PEP 703, although we're still working on the acceptance details...

Our base assumptions are:

- Long-term (probably 5+ years), the no-GIL build should be the only build. We do not want to create a permanent split between with-GIL and no-GIL builds (and extension modules).

- We want to be very careful with backward compatibility. We do not want another Python 3 situation, so any changes in third-party code needed to accommodate no-GIL builds should just work in with-GIL builds (although backward compatibility with older Python versions will still need to be addressed). This is not Python 4. We are still considering the requirements we want to place on ABI compatibility and other details for the two builds and the effect on backward compatibility.

- Before we commit to switching entirely to the no-GIL build, we need to see community support for it. We can't just flip the default and expect the community to figure out what work they need to do to support it. We, the core devs, need to gain experience with the new build mode and all it entails. We will probably need to figure out new C APIs and Python APIs as we sort out thread safety in existing code. We also need to bring along the rest of the Python community as we gain those insights and make sure the changes we want to make, and the changes we want them to make, are palatable.

- We want to be able to change our mind if it turns out, any time before we make no-GIL the default, that it's just going to be too disruptive for too little gain. Such a decision could mean rolling back all of the work, so until we're certain we want to make no-GIL the default, code specific to no-GIL should be somewhat identifiable.

The current plan is to "add the no-GIL build as an experimental build mode, presumably in 3.13... [A]fter we have confidence that there is enough community support to make production use of no-GIL viable, we make the no-GIL build supported but not the default (yet), and set a target date/Python version for making it the default... We expect this to take at least a year or two, possibly more."

"Long-term, we want no-GIL to be the default, and to remove any vestiges of the GIL (without unnecessarily breaking backward compatibility)... We think it may take as much as five years to get to this stage."
Programming

Is C++ Gaining in Popularity? (i-programmer.info) 106

An anonymous reader shares this report from Dice.com: C++ is enjoying a surge in popularity, according to the latest update to the TIOBE Index, which tracks programming languages' "buzz."

C++ currently sits right behind C and Python on TIOBE's list. "A few months ago, the programming C++ language claimed position 3 of the TIOBE index (at the expense of Java). But C++ has not finished its rise. C seems to be its next victim," added the note accompanying the data... ["At the moment, the gap between the two is only 0.76%."]

Matlab, Scratch and Rust also match their all time high records at respectively positions #10, #12 and #17.

So here, according to TIOBE, are the 10 most popular programmings languages:

1. Python
2. C
3. C++
4. Java
5. C#
6. JavaScript
7. Visual Basic
8. SQL
9. PHP
10. MATLAB

The site I Programmer digs deeper: C++ was the only one of the top four languages to see a positive year-on-year change in its percentage rating — adding 0.79% to stand at 10.8%. Python had the smallest loss of the entire Top 20, -0.01% leaving it with a share of 13,42% while Visual Basic had the greatest loss at -2.07%. This, combined with JavaScript gaining 1.34%, led to JavaScript overtaking it to occupy #6, its highest ever ranking in the TIOBE Index.
They also note that COBOL "had a 3-month rise going from a share of 0.41% in April to 0.86% in July which moved it into #20 on the index."
Programming

Does the New 'Mojo' Programming Language Offer a Faster Superset of Python? (infoworld.com) 71

InfoWorld explores how the new Mojo program language "resembles Python, how it's different, and what it has to offer." The newly unveiled Mojo language is being promoted as the best of multiple worlds: the ease of use and clear syntax of Python, with the speed and memory safety of Rust. Those are bold claims, and since Mojo is still in the very early stages of development, it will be some time before users can see for themselves how the language lives up to them. But Mojo's originator — a company named Modular — has provided early access [through a limited-enrollment preview program] to an online playground: a Jupyter Notebook environment where users can run Mojo code and learn about the language's features and behavior...

Mojo can be described as a "superset" of Python. Programs written in Python are valid Mojo programs, although some Python behaviors haven't yet been implemented... It's also possible to use the actual Python runtime for working with existing Python modules, although there is a performance cost. When Mojo introduces new syntax, it's for system-level programming features, chiefly manual memory handling. In other words, you can write Python code (or something almost exactly like it) for casual use cases, then use Mojo for more advanced, performance-intensive programming scenarios... Mojo's other big difference from Python is that Mojo's not interpreted through a runtime, as Python is. Mojo is compiled ahead-of-time to machine-native code, using the LLVM toolchain. To that end, the best performance comes from using features specific to Mojo. Python features are likely to come at the cost of emulating Python's dynamic behaviors, which are inherently slow — or again, by just using the Python runtime.

Many of Mojo's native language features do one of two things. They're either entirely new features not found in Python at all, or expansions of a Python feature that make it more performant, although with less of Python's dynamism.

For example, Mojo has its own fn keyword which defines a function with explicitly-typed and immutable-by-default arguments, and its own struct keyword which is less like a Python class and more like its C/C++ and Rust counterpart "with fixed layouts determined at compile time but optimized for machine-native speed."

But "At a glance, the code closely resembles Python. Even the new Mojo-specific keywords integrate well with existing Python syntax, so you can run your eye down the code and get a general idea of what's happening." And then there's the speed... The notebook demos also give examples of how Mojo code can be accelerated via parallelism, vectorizing, and "tiling" (increasing cache locality for operations). One of the demos, a 128x128 matrix multiplication demo, yielded a claimed 17-times speedup over Python (using the Python runtime in the Mojo playground) by simply running as-is with no special modification. Mojo added 1866x speedup by adding type annotations, 8500x speedup by adding vectorized operations, and 15000x speedup by adding parallelization.
AI

Will Productivity Gains from AI-Generated Code Be Offset by the Need to Maintain and Review It? (zdnet.com) 95

ZDNet asks the million-dollar question. "Despite the potential for vast productivity gains from generative AI tools such as ChatGPT or GitHub Copilot, will technology professionals' jobs actually grow more complicated? " People can now pump out code on demand in an abundance of languages, from Java to Python, along with helpful recommendations. Already, 95% of developers in a recent survey from Sourcegraph report they use Copilot, ChatGPT, and other gen AI tools this way.

But auto-generating new code only addresses part of the problem in enterprises that already maintain unwieldy codebases, and require high levels of cohesion, accountability, and security.

For starters, security and quality assurance tasks associated with software jobs aren't going to go away anytime soon. "For programmers and software engineers, ChatGPT and other large language models help create code in almost any language," says Andy Thurai, analyst with Constellation Research, before talking about security concerns. "However, most of the code that is generated is security-vulnerable and might not pass enterprise-grade code. So, while AI can help accelerate coding, care should be taken to analyze the code, find vulnerabilities, and fix it, which would take away some of the productivity increase that AI vendors tout about."

Then there's code sprawl. An analogy to the rollout of generative AI in coding is the introduction of cloud computing, which seemed to simplify application acquisition when first rolled out, and now means a tangle of services to be managed. The relative ease of generating code via AI will contribute to an ever-expanding codebase — what the Sourcegraph survey authors refer to as "Big Code". A majority of the 500 developers in the survey are concerned about managing all this new code, along with code sprawl, and its contribution to technical debt. Even before generative AI, close to eight in 10 say their codebase grew five times over the last three years, and a similar number struggle with understanding existing code generated by others.

So, the productivity prospects for generative AI in programming are a mixed bag.

Programming

Google's Bard AI Can Now Write and Execute Code To Answer a Question 19

In a blog post on Wednesday, Google said Bard is getting better at logic and reasoning. "Google says that now when you ask Bard a 'computational' task like math or string manipulation, instead of showing the output of the language model, that language model will instead write a program, execute that program, and then show the output of that program to the user as an answer," reports Ars Technica. From the report: Google's blog post provides the example input of "Reverse the word 'Lollipop' for me." ChatGPT flubs this question and provides the incorrect answer "pillopoL," because language models see the world in chunks of words, or "tokens," and they just aren't good at this. It gets the output correct as "popilloL," but more interesting is that it also includes the python code it wrote to answer the question. That's neat for programming-minded people to see under the hood, but wow, is that probably the scariest output ever for regular people. It's also not particularly relevant. Imagine if Gmail showed you a block of code when you just asked it to fetch email. It's weird. Just do the job you were asked to do, Bard.

Google likens an AI model writing a program to humans doing long division in that it's a different mode of thinking [...]. Google says this "writing code on the fly" method will also be used for questions like: "What are the prime factors of 15683615?" and "Calculate the growth rate of my savings." The company says, "So far, we've seen this method improve the accuracy of Bard's responses to computation-based word and math problems in our internal challenge datasets by approximately 30%." As usual, Google warns Bard "might not get it right" due to interpreting your question wrong or just, like all of us, writing code that doesn't work the first time. Bard is coding up answers on the fly right now if you want to give it a shot at bard.google.com.
Programming

Stanford Golf Phenom Rose Zhang Turns Pro, Vows To 'Never Code Again' 75

theodp writes: Golf reports that amateur golf legend Rose Zhang will compete for the first time as a professional when she tees off in the first round of the Mizuho Americas Open Thursday. Golf news is rarely fodder for Slashdot discussion, but when the 20-year-old Stanford student (who plans to complete her degree after a leave of absence) was asked by Golf to identify her toughest class, she threw CS under the bus.

"CS 106A," Zhang replied, referring to a computer science course. "Currently and still trying to grind in that class. It's been a little unfortunate for me. I'm not a CS major. Will never code again after this class." Back in April, Zhang expressed some doubts about being able to juggle the demands of an already-renowned golf career and CS 106A. "I'll be super, super busy," Zhang said in an interview. "I'm planning on taking CS 106A. I don't know if it's a smart decision but it's kind of an essential intro CS class into Stanford so I'm going to try to navigate that, balance that out."

The Stanford Daily reports that CS 106A: Programming Methodology is an introductory programming course taken by 1,600+ students from all academic disciplines each year (2015 Slashdot post on CS 106A's growing pains). According to the syllabus, CS 106A "uses the Python programming language" and there's "no prior programming experience required," although the schedule indicates a lot of ground is covered for someone new to coding (the same could be said of Harvard's famed CS50).

Lest some take Zhang to task for the sin of stating programming is hard, consider that Stanford's CS 106A website suggests the same, reporting that the median score on the midterm exam was only 68%, despite a plethora of review materials and sessions. CS 106A students were offered the chance to submit formal 'regrade requests' to try to improve their midterm scores and can also vie for "a Jamba Juice gift card and 100% on the final exam" by entering a Python programming contest -- one prize will be awarded for "Aesthetic merit", another for "Algorithmic sophistication" (a number of runners-up will be awarded "a grade boost similar to getting a + on one of their assignments").
Python

PyPi is Reducing Stored IP Address Data (theregister.com) 10

The PyPi registry of open source Python packages "began evaluating ways to reduce the amount of identifying information that it stores," reports the Register, "even before the U.S. Justice Department came asking for data on suspect users."

But now, "the Python community package registry wants developers to understand that it's working to minimize the user data that it stores." The goal is not to be unable to respond to lawful requests for information; rather it's to store only the minimum amount of data necessary so as not to expose users to unnecessary privacy intrusion. Coincidentally, data minimization may prevent organizations from becoming a preferred source of on-demand surveillance: having excessive amounts of information about users invites legal demands, which staff then have to handle...

Mike Fiedler, a member of the PyPI admin team, said in a statement on Friday that the organization's effort to improve user privacy and security dates back to 2020. Since the receipt of the subpoenas in March and April, that effort has been reinvigorated.

Much of the concern focuses on IP address data, which gets stored in conjunction with web log access; user events such as logins; project events including uploads; events associated with recently introduced organizations; and administrative PyPI journal entries. According to Fiedler, PyPI was able to stop storing IP data for journal entries — an append-only transaction log — because these were only exposed to administrators... To obscure IP addresses, PyPI is salting them — adding an arbitrary value — and then hashing them — running the data through a one-way scrambling function that creates a value called a hash. This provides a way to store a reference to potentially identifying data without actually storing raw data... PyPI has been using its CDN provider Fastly to pass along a salted hash of the IP address for requests via a custom header, along with broad GeoIP data (the country and city where the user is located), and is using that instead of the raw IP address. In April, the registry adopted code changes for hashing and salting IP addresses for requests that PyPI handles directly in Warehouse, the web application that implements the official Python package index.

And over the past few days, it has been replacing IP addresses in the PyPI user interface with geolocation data. PyPI still relies on IP address information to identify abuse — the creation of malicious packages, harassments, and so on — but Fiedler says even that is being looked at. "We're thinking about how to manage that without storing IP data, but we're not there yet," he said. Fiedler says the PyPI team will be weighing whether it can remove IP data from event history records after a period of time and whether the service can handle all its requests via CDN.

Python

Python 3.12 Brings New Features and Fixes (infoworld.com) 30

"The Python programming language releases new versions yearly, with a feature-locked beta release in the first half of the year and the final release toward the end of the year," writes InfoWorld.

So now Python 3.12 beta 1 has just been released, and InfoWorld compiled a list of its most significant new features. Some highlights: - The widely used Linux profiler tool perf works with Python, but only returns information about what's happening at the C level in the Python runtime. Information about actual Python program functions doesn't show up. Python 3.12 enables an opt-in mode to allow perf to harvest details about Python programs...

- Programs can run as much as an order of magnitude slower when run through a debugger or profiler. PEP 669 provides hooks for code object events that profilers and debuggers can attach to, such as the start or end of a function. A callback function could be registered by a tool to fire whenever such an event is triggered. There will still be a performance hit for profiling or debugging, but it'll be greatly reduced...

- Comprehensions, a syntax that lets you quickly construct lists, dictionaries, and sets, are now constructed "inline" rather than by way of temporary objects. The speedup for this has been clocked at around 11% for a real-world case and up to twice as fast for a micro-benchmark.

- Python's type-hinting syntax, added in Python 3.5, allows linting tools to catch a wide variety of errors ahead of time. With each new version, typing in Python gains features to cover a broader and more granular range of use cases... The type parameter syntax provides a cleaner way to specify types in a generic class, function, or type alias...

- Every object in Python has a reference count that tracks how many times other objects refer to it, including built-in objects like None. PEP 683 allows objects to be treated as "immortal," so that they never have their reference count changed. Making objects immortal has other powerful implications for Python in the long run. It makes it easier to implement multicore scaling, and to implement other optimizations (like avoiding copy-on-write) that would have been hard to implement before.

- With earlier versions of Python, the base size of an object was 208 bytes. Objects have been refactored multiple times over the last few versions of Python to make them smaller, which doesn't just allow more objects to live in memory but helps with cache locality. As of Python 3.12, the base size of an object is now 96 bytes — less than half of what it used to be.

Python

PyPI Was Subpoenaed 31

The PyPI blog: In March and April 2023, the Python Software Foundation (PSF) received three (3) subpoenas for PyPI user data. All three subpoenas were issued by the United States Department of Justice. The PSF was not provided with context on the legal circumstances surrounding these subpoenas. In total, user data related to five (5) PyPI usernames were requested. The data request was:

"Names (including subscriber names, user names, and screen names);"
"Addresses (including mailing, residential addresses, business addresses, and email addresses);"
"Connection records;"
"Records of session times and durations, and the temporarily assigned network address (such as Internet Protocol addresses) associated with those sessions;"
"Length of service (including start date) and type of services utilized;"
"Telephone or instrument numbers (including the registration Internet Protocol address);"
"Means and source of payment of any such services (including any credit card or bank account number) and billing records;"
"Records of all Python Package Index (PyPI) packages uploaded by..." given usernames
"IP download logs of any Python Package Index (PyPI) packages uploaded by..." given usernames

The privacy of PyPI users is of utmost concern to PSF and the PyPI Administrators, and we are committed to protecting user data from disclosure whenever possible. In this case, however, PSF determined with the advice of counsel that our only course of action was to provide the requested data. I, as Director of Infrastructure of the Python Software Foundation, fulfilled the requests in consultation with PSF's counsel.

We have waited for the string of subpoenas to subside, though we were committed from the beginning to write and publish this post as a matter of transparency, and as allowed by the lack of a non-disclosure order associated with the subpoenas received in March and April 2023.
Python

Python's PyPi Package Repository Temporarily Halted New Signups, Citing 'Volume of Malicious Projects' (bleepingcomputer.com) 24

On Saturday PyPI, the official third-party registry of open source Python packages, "temporarily suspended new users from signing up, and new projects from being uploaded to the platform" reports BleepingComputer.

"The volume of malicious users and malicious projects being created on the index in the past week has outpaced our ability to respond to it in a timely fashion, especially with multiple PyPI administrators on leave," stated an incident notice posted by PyPI admins Saturday.

Hours ago they posted a four-word update: "Suspension has been lifted." No details were provided, but The Hacker News writes the incident "comes as software registries such as PyPI have proven time and time again to be a popular target for attackers looking to poison the software supply chain and compromise developer environments." Earlier this week, Israeli cybersecurity startup Phylum uncovered an active malware campaign that leverages OpenAI ChatGPT-themed lures to bait developers into downloading a malicious Python module capable of stealing clipboard content in order to hijack cryptocurrency transactions. ReversingLabs, in a similar discovery, identified multiple npm packages named nodejs-encrypt-agent and nodejs-cookie-proxy-agent in the npm repository that drops a trojan called TurkoRat.
AI

Google Colab Promises 'AI-Powered Coding, Free of Charge' (blog.google) 24

Google Colab hosts free cloud-based "executable documents" that, among other things, let you write and run code in your browser (in dozens of languages, including Python).

Over 7 million people, including students, already use Colab, according to a recent post on Google's blog, "and now it's getting even better with advances in AI [with] features like code completions, natural language to code generation and even a code-assisting chatbot."

Google says it will "dramatically increase programming speed, quality, and comprehension." Our first features will focus on code generation. Natural language to code generation helps you generate larger blocks of code, writing whole functions from comments or prompts. [For example: "import data.csv as a dataframe."] The goal here is to reduce the need for writing repetitive code, so you can focus on the more interesting parts of programming and data science. Eligible users in Colab will see a new "Generate" button in their notebooks, allowing them to enter any text prompt to generate code.

For eligible paid users, as you type, you'll see autocomplete suggestions.

We're also bringing the helpfulness of a chatbot directly into Colab. Soon, you'll be able to ask questions directly in Colab like, "How do I import data from Google Sheets?" or "How do I filter a Pandas DataFrame?"

Anyone with an internet connection can access Colab, and use it free of charge... Access to these features will roll out gradually in the coming months, starting with our paid subscribers in the U.S. and then expanding into the free-of-charge tier.

It's powered by Google's "next generation" machine-learning language model PaLM 2 (announced earlier this month), which "excels at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran and Verilog." Colab will use Codey, a family of code models built on PaLM 2... fine-tuned on a large dataset of high quality, permissively licensed code from external sources to improve performance on coding tasks. Plus, the versions of Codey being used to power Colab have been customized especially for Python and for Colab-specific uses.
Programming

'Mojo May Be the Biggest Programming Language Advance In Decades' (www.fast.ai) 126

Mojo is a new programming language developed by Modular1 that aims to address the performance and deployment limitations of Python in areas like AI model development. After demoing Mojo prior to its launch, Jeremy Howard from the non-profit research group fast.ai said it feels like coding will never be the same again. Here's an excerpt from Howard's article: Modular is a fairly small startup that's only a year old, and only one part of the company is working on the Mojo language. Mojo development was only started recently. It's a small team, working for a short time, so how have they done so much? The key is that Mojo builds on some really powerful foundations. Very few software projects I've seen spend enough time building the right foundations, and tend to accrue as a result mounds of technical debt. Over time, it becomes harder and harder to add features and fix bugs. In a well designed system, however, every feature is easier to add than the last one, is faster, and has fewer bugs, because the foundations each feature builds upon are getting better and better. Mojo is a well designed system.

At its core is MLIR (Multi-Level Intermediate Representation), which has already been developed for many years, initially kicked off by Chris Lattner at Google. He had recognized what the core foundations for an "AI era programming language" would need, and focused on building them. MLIR was a key piece. Just as LLVM made it dramatically easier for powerful new programming languages to be developed over the last decade (such as Rust, Julia, and Swift, which are all based on LLVM), MLIR provides an even more powerful core to languages that are built on it. Another key enabler of Mojo's rapid development is the decision to use Python as the syntax. Developing and iterating on syntax is one of the most error-prone, complex, and controversial parts of the development of a language. By simply outsourcing that to an existing language (which also happens to be the most widely used language today) that whole piece disappears! The relatively small number of new bits of syntax needed on top of Python then largely fit quite naturally, since the base is already in place.

The next step was to create a minimal Pythonic way to call MLIR directly. That wasn't a big job at all, but it was all that was needed to then create all of Mojo on top of that -- and work directly in Mojo for everything else. That meant that the Mojo devs were able to "dog-food" Mojo when writing Mojo, nearly from the very start. Any time they found something didn't quite work great as they developed Mojo, they could add a needed feature to Mojo itself to make it easier for them to develop the next bit of Mojo!
You can give Mojo a try here.
Google

Google Announces PaLM 2, Its Next Generation Language Model (blog.google) 6

Google, in a blog post: PaLM 2 is a state-of-the-art language model with improved multilingual, reasoning and coding capabilities.

Multilinguality: PaLM 2 [PDF] is more heavily trained on multilingual text, spanning more than 100 languages. This has significantly improved its ability to understand, generate and translate nuanced text -- including idioms, poems and riddles -- across a wide variety of languages, a hard problem to solve. PaLM 2 also passes advanced language proficiency exams at the "mastery" level.
Reasoning: PaLM 2's wide-ranging dataset includes scientific papers and web pages that contain mathematical expressions. As a result, it demonstrates improved capabilities in logic, common sense reasoning, and mathematics.
Coding: PaLM 2 was pre-trained on a large quantity of publicly available source code datasets. This means that it excels at popular programming languages like Python and JavaScript, but can also generate specialized code in languages like Prolog, Fortran and Verilog.

Even as PaLM 2 is more capable, it's also faster and more efficient than previous models -- and it comes in a variety of sizes, which makes it easy to deploy for a wide range of use cases. We'll be making PaLM 2 available in four sizes from smallest to largest: Gecko, Otter, Bison and Unicorn. Gecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline. This versatility means PaLM 2 can be fine-tuned to support entire classes of products in more ways, to help more people.

At I/O today, we announced over 25 new products and features powered by PaLM 2. That means that PaLM 2 is bringing the latest in advanced AI capabilities directly into our products and to people -- including consumers, developers, and enterprises of all sizes around the world. Here are some examples:

PaLM 2's improved multilingual capabilities are allowing us to expand Bard to new languages, starting today. Plus, it's powering our recently announced coding update.
Workspace features to help you write in Gmail and Google Docs, and help you organize in Google Sheets are all tapping into the capabilities of PaLM 2 at a speed that helps people get work done better, and faster.
Med-PaLM 2, trained by our health research teams with medical knowledge, can answer questions and summarize insights from a variety of dense medical texts. It achieves state-of-the-art results in medical competency, and was the first large language model to perform at "expert" level on U.S. Medical Licensing Exam-style questions. We're now adding multimodal capabilities to synthesize information like x-rays and mammograms to one day improve patient outcomes. Med-PaLM 2 will open up to a small group of Cloud customers for feedback later this summer to identify safe, helpful use cases.

Slashdot Top Deals