Skip to main content

📚 Instadeq Reading List October 2021

Here is a list of content we found interesting this month.

The Ongoing Computer Revolution

Most people thought it was crazy to devote a whole computer to the needs of one person—after all, machines are fast and people are slow. But that’s true only if the person has to play on the machine’s terms. If the machine has to make things comfortable for the person, it’s the other way around. No machine, even today, can yet keep up with a person’s speech and vision.

Today’s PC is about 10,000 times as big and fast as an Alto. But the PC doesn’t do 10,000 times as much, or do it 10,000 times as fast, or even 100 x of either. Where did all the bytes and cycles go? They went into visual fidelity and elegance, integration, backward compatibility, bigger objects (whole books instead of memos), and most of all, time to market.

Im constantly amazed at the number of people who think that there’s not much more to do with computers. Actually, the computer revolution has only just begun.

👤 Butler Lampson

📝 The Ongoing Computer Revolution

Some Thoughts on Interfaces

We want to instantly grasp how to use interfaces without any instruction, even as we hope to be able to solve increasingly complex problems. One of the great myths of interface design is that all interfaces must be simple, and that everything should be immediately intuitive. But these aims are often contradictory - just because something is simple in its visual layout does not mean it will be intuitive! Intuitiveness also is extremely culturally relative - something that may be visually intuitive in one culture, for example, may not be in another; because of everything from language layout, to the role of color, and even the way different cultures process the passing of time.

If we are to empower users to accomplish complex tasks through software, the interface itself may have to be complex. That is not to say that the interface has to be difficult to use! Complex interfaces should, instead, guide the user to an understanding of their capabilities and operation while still keeping them in a flow state. Regardless of how complex the interface is, or the point in the path when the user is learning to use it, they should still be actively engaged in the process, and not become discouraged or feel overwhelmed by the complexity. Interfaces should not shy away from complexity, but should instead guide and assist the user in understanding the complexity.

Why does software not support learning how to use the software inside the software itself?

Why don’t we allow software to teach its users how to use it, without having to rely on these external sources? What would allow for this change to occur?

🐦 @nickarner

📝 Some Thoughts on Interfaces

New Kind of Paper

This new kind of paper understands what you are writing and try to be smart about what you want.

Are there any programming languages that were designed for pen and paper? Yes, there was. A Programming Language, also known as APL. A language that started as a notation that was designed for human-to-human communication of computer programs, usualy written with pen on a paper, or chalk on a blackboard. To be fair, even APL suffered the transition from blackboard to keyboard. Original notation had sub-/superscripts and flow of the program was depicted with lines. When APL became a programming language, it was linearized, lost its flowchart-like visual, but kept its exotic glyphs.

So, if we throw out boxes-and-arrows, i.e. visual programming stuff, what's left? What is the essence of what we are trying do here? Is there a place for a more symbolic, but visually-enriched approach?

Mathematical ideas are conventionally expressed using notation and terminology developed using static media. Suppose, however, that mathematics had been invented after modern computers. This is perhaps difficult to imagine – after all, mathematics helped lead to computers – but let's do the thought experiment anyway. Might mathematical notation have developed in a different way? Would we instead have developed a dynamic, interactive notation more powerful than the static mathematical and linguistic notations in common use today?

🌐 Milan Lajtoš

📝 New Kind of Paper Part 1

📝 New Kind of Paper Part 2

📝 New Kind of Paper Part 3

BI is dead

How an integration between Looker and Tableau fundamentally alters the data landscape.

This could be the beginning of the bifurcation of traditional BI into two worlds: One for data governance and modeling applications, and one for the visualization and analytics applications.”

“If you split Looker into LookML and a visualization tool, which one would be BI?” Or, in the terms of this integration, if you have both Looker and Tableau, which one is your BI tool?

My blunt answer is Tableau. You answer your questions in Tableau; BI tools are, above all, where questions get answered.

In this world, the cloud service providers become the major combatants in the market for data infrastructure, while data consumption products designed for end-users and sold on a per-seat basis—including exploration tools, a reconstituted BI, and data apps—are built by the rest of the ecosystem.

🐦 Benn Stancil

📝 BI is dead

Market Research at Bell Labs: Picture Phone vs Mobile Phone

During its long existence Bell Labs developed many revolutionary technologies, two of them were the Picture Phone and the Mobile Phone, both had market research studies commissioned with varying levels of accuracy at predicting the actual success of the product.

Picturephone Product Photo

About the Picturephone

AT&T executives had in fact decided to use the fair as an opportunity to quietly commission a market research study. That the fairgoers who visited the Bell System pavilion might not represent a cross section of society was recognized as a shortcoming of the survey results.

...

Users complained about the buttons and the size of the picture unit; a few found it difficult to stay on camera. But a majority said they perceived a need for Picturephones in their business, and a near majority said they perceived a need for Picturephones in their homes.

...

When the AT&T market researchers asked Picturephone users whether it was important to see the person they were speaking to during a conversation, a vast majority said it was either "very important" or "important".

...

Apparently the market researchers never asked users their opinion whether it was important, or even pleasurable, that the person they were speaking with could see them, too.

—The Idea Factory. Page 230-231

Picturephone Use Case Sketch

About the Mobile Phone

A marketing study commissioned by AT&T in the fall of 1971 informed its team that "there was no market for mobile phones at any price."

...

Though Engel didn't perceive it at the time, he later came to believe that marketing studies could only tell you something about the demand for products that actually exist. Cellular phones were a product that people had to imagine might exist.

—The Idea Factory. Page 289

Similar yet Different

But anyone worrying that the cellular project might face the same disastrous fate as the Picturephone might see that it had one advantage. A Picturephone was only valuable if everyone else had a Picturephone. But cellular users didn't only talk to other cellular users. They could talk to anyone in the national or global network. The only difference was that they could move.

—The Idea Factory. Page 289

Resources

📚 Instadeq Reading List September 2021

Here is a list of content we found interesting this month.

Liveness

To build the last-mile of corporate technology, to usher in a new era of computing literacy, and to generally indulge the insatiable appetite of world-munching software, we need live apps that, like spreadsheets, are not treated as finished products, but as building blocks to be extended in real time using low-code techniques.

Live apps are not finished products. They can be extended in real time using low-code techniques that blur the line between user and developer.

🐦 Michael Gummelt

🔗 vision.plato.io

Is BI Dead?

Over the last decade, many of these early BI functions have been stripped out of BI and relaunched as independent products.

Just as the cloud rewrote our expectations of what software is and what it isn’t, the modern data stack is slowly rewriting our expectations of BI.

BI tools should aspire to do one thing, and do it completely: They should be the universal tool for people to consume and make sense of data. If you—an analyst, an executive, or any person in between—have a question about data, your BI tool should have the answer.

The boundary between BI and analytical research is an artificial one. People don’t sit cleanly on one side or the other, but exist along a spectrum (should a PM, for example, use a self-serve tool or a SQL-based one?). Similarly, analytical assets aren’t just dashboards or research reports; they’re tables, drag-and-drop visualizations, narrative documents, decks, complex dashboards, Python forecasts, interactive apps, and novel and uncategorizable combinations of all of the above.

A better, more universal BI tool would combine both ad hoc and self-serve workflows, making it easy to hop between different modes of consumption. Deep analysis could be promoted to a dashboard.

Marrying BI with the tools used by analysts brings everyone together in a single place. A lot of today’s analytical work isn’t actually that collaborative.

🐦 Benn Stancil

🔗 Is BI dead?

What is analytics engineering?

Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions.

While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base.

Today, if you’re a “modern data team” your first data hire will be someone who ends up owning the entire data stack.

On the surface, you can often spot an analytics engineer by the set of technologies they are using (dbt, Snowflake/BigQuery/Redshift, Stitch/Fivetran). But deeper down, you’ll notice they are fascinated by solving a different class of problems than the other members of the data team. Analytics engineers care about problems like:

  • Is it possible to build a single table that allows us to answer this entire set of business questions?

  • What is clearest possible naming convention for tables in our warehouse?

  • What if I could be notified of a problem in the data before a business user finds a broken chart in Looker?

  • What do analysts or other business users need to understand about this table to be able to quickly use it?

  • How can I improve the quality of my data as its produced, rather than cleaning it downstream?

The analytics engineer curates the catalog so that the researchers can do their work more effectively.

🐦 dbt

🔗 What is analytics engineering?

Is the modern analytics stack unbundling, or consolidating?

Despite a recent proliferation of tools in the modern data stack, it’s unclear whether we’re seeing an unbundling of data tooling into many separate layers, or the first steps towards consolidation of data tools.

One popular interpretation of this explosion of data tools is that we are witnessing the “unbundling” of the data stack. Under this interpretation, classically monolithic data tools like data warehouses are being dismantled into constituent parts.

However, it’s also possible that this “unbundling” represents a temporary state of affairs. Specifically, under this alternative thesis – which we’ll call “consolidation” – the proliferation of data tools today reflects what will ultimately become a standard set of features within just a few discrete, consolidated layers of the data stack.

If consolidation is so beneficial to users, why are we seeing “unbundling” now? My thesis is that this unbundling is a response to the rapidly-evolving demands on and capabilities of cloud data.

In a nutshell, the data ecosystem is slowly rebuilding the warehouse and analysis layers to adapt to the new reality of cloud data.

In the next two years, I expect we’ll see more attempts to consolidate the modern data stack, albeit in intermediate stages – for example, the consolidation of data pipelines and transformation, data catalogs with metrics layers, and dashboards with diagnostics.

Much of the work – especially in the analysis layer – is spread across an absurd number of tools today – not just business intelligence, but also spreadsheets, docs, and slides. Consolidating this work has the potential to transform the future of work for every modern organization, and to redefine the future of data.

🐦 Peter Bailis

🔗 Is the modern analytics stack unbundling, or consolidating?

Computer Science: A Discipline Misnamed (Fred Brooks)

I've been having conversations lately about topics related to the one in the title and somehow I got to an article titled The Computer Scientist as Toolsmith II (1994) by Fred Brooks (thanks to whoever recommended it, I wish we had better document/navigation provenance in our tools :)

Here's a summary, from now on it's all quotes from the article, emphasis mine (I wish we had Transclusion in our tools :).

I recommend you to read the whole article if you find the quotes below interesting.

A Discipline Misnamed

When our discipline was newborn, there was the usual perplexity as to its proper name.

We at Chapel Hill, following, I believe, Allen Newell and Herb Simon, settled on “computer science” as our department’s name.

Now, with the benefit of three decades’ hindsight, I believe that to have been a mistake.

If we understand why, we will better understand our craft.

What is a Science?

A science is concerned with the discovery of facts and laws.

Perhaps the most pertinent distinction is that between scientific and engineering disciplines.

Distinction lies not so much in the activities of the practitioners as in their purposes.

The scientist builds in order to study; the engineer studies in order to build.

What is our Discipline?

I submit that by any reasonable criterion the discipline we call “computer science” is in fact not a science but a synthetic, an engineering, discipline. We are concerned with making things, be they computers, algorithms, or software systems.

Unlike other engineering disciplines, much of our product is intangible: algorithms, programs, software systems.

Heinz Zemanek has aptly defined computer science as "the engineering of abstract objects".

In a word, the computer scientist is a toolsmith—no more, but no less. It is an honorable calling.

A toolmaker succeeds as, and only as, the users of his tool succeed with his aid.

How can a Name Mislead Us?

If our discipline has been misnamed, so what? Surely computer science is a harmless conceit. What’s in a name? Much. Our self-misnaming hastens various unhappy trends.

First, it implies that we accept a perceived pecking order that respects natural scientists highly and engineers less so, and that we seek to appropriate the higher station for ourselves.

We shall be respected for our accomplishments, not our titles.

Second, sciences legitimately take the discovery of facts and laws as a proper end in itself. A new fact, a new law is an accomplishment, worthy of publication. If we confuse ourselves with scientists, we come to take the invention (and publication) of endless varieties of computers, algorithms, and languages as a proper end. But in design, in contrast with science, novelty in itself has no merit.

If we recognize our artifacts as tools, we test them by their usefulness and their costs, not their novelty.

Third, we tend to forget our users and their real problems, climbing into our ivory towers to dissect tractable abstractions of those problems, abstractions that may have left behind the essence of the real problem.

We talk to each other and write for each other in ever more esoteric vocabularies, until our journals become inaccessible even to our society members.

Fourth, as we honor the more mathematical, abstract, and “scientific” parts of our subject more, and the practical parts less, we misdirect young and brilliant minds away from a body of challenging and important problems that are our peculiar domain, depriving these problems of the powerful attacks they deserve.

Our Namers got the “Computer” Part Exactly Right

The computer enables software to handle a world of complexity not previously accessible to those limited to hand techniques. It is this new world of complexity that is our peculiar domain.

Mathematicians are scandalized by the complexity— they like problems which can be simply formulated and readily abstracted.

Physicists or biologists, on the other hand, are scandalized by the arbitrariness. Complexity is no stranger to them. The deeper the physicists dig, the more subtle and complex the structure of the “elementary” particles they find.

But they keep digging, in full faith that the natural world is not arbitrary, that there is a unified and consistent underlying law if they can but find it.

No such assurance comforts the computer scientist.

The Toolsmith as Collaborator

If the computer scientist is a toolsmith, and if our delight is to fashion power tools and amplifiers for minds, we must partner with those who will use our tools, those whose intelligences we hope to amplify.

The Driving-Problem Approach

Hitching our research to someone else’s driving problems, and solving those problems on the owners’ terms, leads us to richer computer science research.

How can working on the problems of another discipline, for the purpose of enhancing a collaborator, help me as a computer scientist? In many ways:

  • It aims us at relevant problems, not just exercises or toy-scale problems.

  • It keeps us honest about success and failure, so that we don’t fool ourselves so easily.

  • It makes us face the whole problem, not just the easy or mathematical parts. We can’t assume away ill-conditioned cases.

  • Facing the whole problem in turn forces us to learn or develop new computer science, often in areas we otherwise never would have addressed.

  • Besides all of that, it is just plain fun to look over the shoulders of those discovering how proteins work, or designing submarines, or fabricating on the nanometer scale.

Two of our criteria for success in a tool are:

  • It must be so easy to use that a full professor can use it, and

  • It must be so productive that full professors will use it.

Our Journey to a Low-Code Data Lake

Overview

Since November 2014 we have been implementing a platform to collect, store, process, analyse and visualize logs using big data and streaming technologies. The main objective is to detect important events, generate statistics and produce alerts to solve problems in a proactive way.

Application servers (such as Glassfish, SunOne or Weblogic), HTTP Servers (Apache HTTP Server), Databases, Web Applications, Batch Processes, Java Applications, Windows and Linux Servers and related technologies are constantly generating logs with errors, warnings, incidents, accesses, audit logs, executed commands and all kinds of information about their running processes.

Our first pilot on a new customer is usually about application logs but it is a "Trojan Horse". Once this platform is up and running it naturally becomes the centralized repository not only for logs but also for any type of sources and data formats. With dashboards on display in big screens other areas and departments start requesting dashboards of their own.

But this is only the beginning of a long road for us and our customers. To adapt to different use cases and new requirements we needed a more modular, flexible and extensible architecture.

As the Data Lake gained prominence within the Ministry of Social Security during 2015 and other project opportunities materialized, challenges and constraints emerged that led us to rethink our initial architecture.

The constraints came in two basic flavors: technical and business. Throughout 2016 we dedicated our spare time to study and evaluate tools to make the leap in quality that we needed. In the next sections we are going to focus on the business constraints and why StreamSets and Instadeq, our Low-Code/No-Code stack, were the answer to them.

Old Architecture [2015-2016]

/galleries/post-images/our-journey-to-a-lowcode-data-lake/old-architecture.png

Challenges

Technical issues are not the only constraints in software architecture design. Even when they are the most important, business restrictions are highly influential too:

  1. Data democratization

    We were struggling to keep up with all the data reports and visualizations. We needed a way to allow non-specialists to have the ability to gather and analyze data without requiring help from our team.

  2. Data visualisation and data-driven decision making

    We were part of meetings where people were making decisions based on intuition or using reports outdated by days, weeks or even months. We needed a platform to allow non-specialists to make ad-hoc reports on easily accessible, accurate and up to date data.

  3. Skill gap between ideal position requirements and market availability

    It was really hard for our local partners to find qualified employees to work in our area. The solution at hand was to hire junior engineers right after their graduation to help in active projects. This increased our need for easy to use tools that allowed new employees to be productive from day one.

  4. Public procurement cycle

    The way projects are structured in the public sector makes it common for us to do 3 to 6 month projects to solve particular problems and then have to wait for months until the next stage is approved for us to go back. The gaps in our presence require the end user to be able to inspect, troubleshoot and make small modifications to running systems without requiring deep technical skills. Our integration and visualization tools should make it easy for them to create new dashboards or modify existing ones.

  5. Reduce manual work on deployment process

    We relied on scripts to automate most tasks but we needed a platform to minimize the manual work. We required a tool to centralize the creation, validation, testing, building and deployment of pipelines in order to eliminate the intermediate steps between our workstations and the integrations running on production.

  6. Data Integration and data visualization as a new commodity

    When we started in 2014, having a centralized Data Lake where our customers could make ad-hoc queries, live dashboards and mapreduce jobs was enough to win projects. After a few months that became a commodity, a starting point.

    Most dashboards, including the data integrations were required to be live the next morning. We were working in environments with one urgency after another. We needed tools to create new integrations and consume, parse, store and visualize data as soon as possible.

  7. Monitor data integration pipelines and troubleshoot with clarity

    Both in development as in production we needed tools that allowed anyone to find the cause of issues as quickly and as easily as possible, this activity is easier to achieve if live data is available for inspection. We had tools that were efficient and performant but when there was a problem it was really hard to find out the cause. Rebuilding and redeploying jobs in multiple nodes made the process slow and tedious.

  8. Live and interactive demos

    We started giving many presentations and pre-sales talks. Our audience wanted to see live demos of real projects. We needed something more attractive than config files, bash scripts and a lot of JSON.

  9. Flexible architecture

    Our local partners were looking for partnerships in the Big Data space, this required us to prioritize some vendors over others. Also some of our customers already had licenses for specific products. We needed an architecture that was flexible enough to replace one component for another without changing the solution's nature.

  10. Fast analytics on fast data

    Evolving expectations from our customers called for a more powerful data storage engine. It needed to be fast for inserts but also fast for updates and ad-hoc queries and analytics.

Solution: A Low-Code Architecture

StreamSets Data Collector and Instadeq, our Low-Code/No-Code stack, gave us the leap in quality we needed and responded to many of the challenges we had. Both backed by very solid tools like Kafka, Solr, Hive, HDFS and Kudu. In January 2017 we had our new architecture installed and running in production.

With StreamSets and Instadeq our time to production was shortened by days, thanks to their user-friendly UI, visual approach and drag-and-drop capabilities.

These tools have prebuilt and reusable components to empower non-technical business users to build end-to-end data transformations and visualizations without the need to write a single line of code. The inclusion of non-technical users in the data lake construction process eliminated the need to hire expensive, specialized developers and promoted data democratization and data driven decisions.

These intuitive tools solved our problem with the public procurement cycle. Now the end user was able to inspect, troubleshoot and make small modifications to running pipelines and dashboards.

Unlike other low-code tools with rigid templates that limit what you can build and customization is restricted, StreamSets and Instadeq have components that allow you to go low level and even extend the platform.

Current Architecture [2017-2021]:

/galleries/post-images/our-journey-to-a-lowcode-data-lake/current-architecture.png

Why StreamSets?

/galleries/post-images/our-journey-to-a-lowcode-data-lake/streamsets.png

The selection process for the ingestion and transformation component in our architecture took the longest time.

We have a deep belief that in order to make a long-term commitment to a tool it is very important to trust not only the latest version of the product but also the team and company behind it.

Since we started with Big Data in November 2014 until we built the new architecture in January 2017, we analyzed and tested multiple tools both in development and production and it was StreamSets that excelled above all of them and continues to do so to this day.

As a product development company we have experience inferring if a product aligns with our vision of what a Big Data solution should be. We have a series of defined steps when analyzing any product:

  1. Implement a real use case end-to-end to validate features but also to experience as an end user

  2. Analyze the development team checking their public code repositories, the clarity of the commit logs, number of collaborators and level of activity

  3. Issue tracker responses

  4. Release notes to have an idea of the product’s evolution

  5. How easy it is to extend, build, test, deploy and integrate

  6. Community commitment, reviews, blogs, forums and responses in stackoverflow

  7. Vision from founders and investors

The feedback we received from the Big Data team confirmed that it was the right choice for data ingestion. We have a flexible architecture that allows interchangeability of components and it is the team that pushes to use StreamSets when there are other alternatives to consider. They are happy to work with a tool that gives them solutions without struggling with it.

The main highlights from our team when comparing it to other tools:

  • Quick and easy troubleshooting during development and in production

  • Detailed logging shows what’s happening

  • Snapshots and previews

  • Handles increasing data volume and number of pipelines with ease

  • Fast data drift adaptation

  • Supports structured and non-structured data

  • Variety of sources, processors, sinks and formats and ease of extension when some component is not available

  • No downtimes for maintenance thanks to pipeline replication

  • Early warnings, threshold rules and alarms

  • Detailed metrics and observability

  • Great integration with the rest of the architecture components such as Kafka, HDFS, Solr and Instadeq

The last big surprise we had was how StreamSets improved our pre-sale presentations. Being a clear and visual tool, it allowed us to make demos where managers could understand what they were going to find at the end of the project. StreamSets and Instadeq allowed us to have functional, clear and highly visual demos. Something that’s not possible with other ETL tools or frameworks like Spark and Flink.

With our continuous and expanding use of StreamSets as our data integration component we started formalizing some emerging patterns to:

  • Simplify maintenance

  • Avoid downtimes

  • Take advantage of Kafka, HAproxy and pipeline replication to scale integrations with large data volumes or complex transformations

Our StreamSets Data Integration Patterns post contains a list of our most used patterns.

Success Stories

We had great success with this platform with our first customer in 2015 and this lead us to get new prospects mainly in Government Agencies. Our list of implementations and pilots includes:

  • Ministry of Employment and Social Security

    Logs centralization and fraud detection jobs around social security pensions and other social programmes. Acknowledgments and articles in technology magazines:

  • Ministry of Health

    Part of the Data Lake and Instadeq Dashboards were used by the Covid Task Force to support the decision making process and to monitor covid cases, the Vaccination Programme and EU Digital COVID Certificate Processes.

    We also provided internal tools built with Instadeq to support Vaccination Centers and doctors accesing information about patients, vaccines and certificates.

  • Central Bank

    Aggregation and visualization of logs from Windows Servers, Event Tracing for Windows (ETW), Internet Information Server (IIS) and .NET applications.

  • Ministry of Education

    Fraud detection jobs on Sick Leaves and school resources management.

  • Telecommunications Services Company

    Monitorization and incident response for their Data Center consuming signals from routers, switches and Uninterruptible Power Supply (UPS).

  • One of the largest media and retail companies in the country

    The pilot mainly focused on implementing Machine Learning on infrastructure logs on Supermarket’s Christmas marketing campaigns.

The Future

Now that data integration and visualization is a commodity for our customers thanks to StreamSets, Kafka and Instadeq, our new challenges are to dive deeper with stream processing, security, data governance and machine learning.

We have already implemented a few machine learning cases with Tensorflow, Pandas and Scikit-learn using Docker containers and Jupyter notebooks for development.

We are studying tools for Machine Learning lifecycle management such as Kubeflow, mlflow and Seldon. We have recently implemented Apache Atlas for metadata management and governance capabilities and Apache Ranger for security administration, fine grained authorization and auditing user access.

Upcoming Projects:

  • Inmigration and Border Services: Use cases still under analysis to process live data on the main country airport.

  • Revenue Agency / Taxation Authority: The firsts use cases include processing terabytes of database query audit logs and middleware logs using Oracle Golden Gate to stream them to Kafka. The project also includes the monitorization of infrastructure and Java application logs.

New Architecture Under Study [2021-]

/galleries/post-images/our-journey-to-a-lowcode-data-lake/future-architecture.png

Our StreamSets Data Integration Patterns

With our continuous and expanding use of StreamSets as our data integration component we started formalizing some emerging patterns that allowed us to:

  • Simplify maintenance

  • Avoid downtimes

  • Take advantage of the use of Kafka, HAproxy and pipeline replication to scale integrations with large data volumes or complex transformations

What follows is a list of some of our most used patterns.

Logical High-Level Data Integration Pattern

As a general rule, we divide each integration into 3 stages that can contain 3 or more pipelines.

/galleries/post-images/our-streamsets-data-integration-patterns/integration-standard.png

The advantages for this approach are:

  • Stages isolation allows process changes at a specific stage without affecting the others.

  • Kafka in the middle assures that if a parser or store is down, data continues to be consumed and stored in Kafka for 7 days.

  • Kafka also allows parallel processing with replicated pipelines.

  • The parsed logs pushed to Kafka can be accessed immediately by other tools such as Flink Streaming, KSQL or Machine Learning containers without replicating the rules applied in the Streamsets pipelines.

  • We can have pipelines running and consuming logs in computers outside our cluster and pushing the data to Kafka. Later the parsing and storage is centralized in the cluster.

Stage 1 - Consumer Pipeline

A StreamSets pipeline consumes raw data and pushes it into a Kafka Topic with the name projectname.raw.integration. For example projectx.raw.weblogic

  • This allows us to modify, stop and restart the Parse and Store pipelines without losing the data generated during maintenance windows

  • This pipeline is not needed for agents that write directly to kafka

  • For some specific cases, the same pipeline or another one writes the data to HDFS or external storage that needs the raw data. Mainly sources required for audits.

/galleries/post-images/our-streamsets-data-integration-patterns/1-consumer.png

Stage 2 - Parse & Enrichment Pipeline

A second pipeline consumes the raw data from the raw topic, parses and enriches the log and stores the new data in a new topic: projectname.parsed.integration

The most common transformations are:

  • Parse log pattern using Log Parser.

  • Remove unnecessary fields with Field Remover.

  • Rename or change a field's case using Field Renamer.

  • Generate new fields using Expression Evaluators.

  • Enrichment using Redis Lookup or Apache Solr as key value stores. For example:

    • Add an environment field (prod, stage, qa, dev) using the hostname as key

    • Add company details using their tax identification number

    • IP geolocation lookup

  • Convert date to UTC or string to numbers using Field Type Converter.

  • Discard records or route them to different Kafka topics using Stream Selector.

  • Flatten fields with Field Flattener.

  • Some complex business rules written in Jython.

  • Generate fields year, month, day for partitioning in Kudu.

  • Generate Globally Unique Identifier (GUID / UUID).

  • Parse date fields.

The following is our “2 - Apache Access - Parser” pipeline:

/galleries/post-images/our-streamsets-data-integration-patterns/apache-log-parser.png

Steps:

1. Consumes logs from the "raw" topic
2. Converts HTTP body to Json
3. Pivot Body fields to the root of the Record
4. Route apache error and apache access to different paths

   1.1. Parse apache error
   1.2. Add Kafka destination: parsed.apache_error topic

   2.1. Parse apache access
   2.2. Add Kafka destination: parsed.apache_access topic

5. Flatten fields
6. Remove headers and parsed names
7. Remove fields with raw data
8. Clean host names (lowercase and remove domain)
9. Solve environment using Redis
10. Add an ID field with an UUID
11. Convert dates to ISO
12. Convert timestamp fields to Long
13. Convert timestamps to milliseconds
14. Resolve apache access refererer
15. For apache access logs, use the IP address to identify geolocation
    using Apache Solr
16. Write the log to project_name.parsed.apache_access
    or project_name.parsed.apache_error

Stage 3 - Storage & Visualization Pipelines

The third pipeline consumes the parsed data from the topic and sends it to:

  • A Kudu table

  • HDFS folder

  • External storage (outside our cluster) such as Oracle databases

  • NAS storage for historical and backup such as EMC Isilon

  • Apache Solr

  • External Kafka Brokers (outside our cluster)

  • Instadeq for live dashboards and data exploration

Sometimes we use a single pipeline in StreamSets but if the destinations have different speeds or if one of them has more errors that provokes pipeline restarts, we use different pipelines for each destination.

We maintain the number 3 for all the pipelines because they all consume the logs from the parsed topics but send them to different destinations:

  • 3- Tomcat Access - Kudu Storage

  • 3- Tomcat Access - HDFS Storage

  • 3- Tomcat Access - Long-Term Isilon Storage

  • 3- Tomcat Access - Instadeq Dashboard

/galleries/post-images/our-streamsets-data-integration-patterns/3-store.png

Scaling Pipelines

When we have a pipeline with a high data volume, we scale it with:

For example in two customers we receive Tomcat access logs through HTTP posts and we have the following setup:

/galleries/post-images/our-streamsets-data-integration-patterns/lb-and-paralelism-flows.png

HAproxy Load Balancer

We create a new entry in our server running HAproxy: /etc/haproxy/haproxy.cfg

We receive HTTP Requests on port 4005 and forward them to 3 different “1- Tomcat Access - Consumer” StreamSets pipelines running in cluster-host-01, cluster-host-02 and cluster-host-03

frontend tomcat_access
bind 0.0.0.0:4005
mode http
stats enable
stats refresh 10s
stats hide-version
default_backend tomcat_access_servers

tomcat_access_servers
balance roundrobin
default-server maxconn 20
server sdc1 cluster-host-01:4005 check port 4005
server sdc2 cluster-host-02:4005 check port 4005
server sdc3 cluster-host-03:4005 check port 4005

Kafka: Topic Partitioning

We create projectx.raw.tomcat_access and projectx.parsed.tomcat_access Kafka topics with 3 or more partitions:

/bin/kafka-topics.sh --create \
        --zookeeper <hostname>:<port> \
        --topic projectx.raw.tomcat_access \
        --partitions 3 \
        --replication-factor <number-of-replicating-servers>

/bin/kafka-topics.sh --create \
        --zookeeper <hostname>:<port> \
        --topic projectx.parsed.tomcat_access \
        --partitions 3 \
        --replication-factor <number-of-replicating-servers>

Streamsets: Pipeline Replication

We replicate our pipeline to run in different cluster nodes, each one consuming from one of those partitions.

1- Tomcat Access - Consumer

  • Three pipeline instances running in three different nodes

  • Consume logs using a HTTP Server Origin listening on port 4005

  • Write the raw logs to Kafka using a Kafka Producer Destination

  • Target Kafka topic: projectx.raw.tomcat_access

2- Tomcat Access - Parser

  • Three pipeline instances running in three different nodes

  • Consume raw logs from Kafka using a Kafka Consumer Origin

  • Source topic: projectx.raw.tomcat_access

  • Parse, enrich and filter the logs

  • Write them to Kafka using a Kafka Producer Destination

  • Target topic: projectx.parsed.tomcat_access

3- Tomcat Access - Storage

  • Three pipeline instances running in three different nodes

  • Consume enriched Tomcat access logs from Kafka using a Kafka Consumer Origin

  • Source Kafka topic: projectx.parsed.tomcat_access

  • Store them in a Kudu table using a Kudu Destination.

  • Target Kudu table: projectx.tomcat_access

3- Tomcat Access - Instadeq Dashboard

  • A single pipeline instance

  • Consume enriched Tomcat Access logs from the three Kafka partitions using a Kafka Consumer Origin

  • Source Kafka Topic: projectx.parsed.tomcat_access

  • Send them to Instadeq using Instadeq webhooks

Examples

  1. Linux logs from RSysLog to a Kudu table using StreamSets for data ingestion and transformation

/galleries/post-images/our-streamsets-data-integration-patterns/linux-logs.png
  1. Java application logs sent to the cluster using log4j or logback, with StreamSets for data ingestion and transformation, KSQL for streaming analytics, Kudu for storage and Instadeq for live dashboards

/galleries/post-images/our-streamsets-data-integration-patterns/app-logs.png
  1. From Redmine to Instadeq Dashboard using Streamsets: Direct Integration without Kafka or any storage in the middle

/galleries/post-images/our-streamsets-data-integration-patterns/sdc-redmine-to-instadeq.png

📚 Instadeq Reading List August 2021

Here is a list of content we found interesting this month.

📑 A Brief History of Human Computer Interaction Technology by Brad Myers

Great list of HCI technologies by Brad A. Myers organized in categories, sorted by year and with references to all of them.

🐦 Brad Myers

🔗 Page: A Brief History of Human Computer Interaction Technology

🔗 ACM Walled Article: A brief history of human-computer interaction technology

Make sure to also check Brad A. Myers' Youtube Channel for great HCI Content

🎈 Design Principles Behind Smalltalk by Dan Ingalls

How many programming languages define themselves this way?

The purpose of the Smalltalk project is to provide computer support for the creative spirit in everyone. Our work flows from a vision that includes a creative individual and the best computing hardware available. We have chosen to concentrate on two principle areas of research: a language of description (programming language) that serves as an interface between the models in the human mind and those in computing hardware, and a language of interaction (user interface) that matches the human communication system to that of the computer.

Glamorous Toolkit vibes here:

Reactive Principle: Every component accessible to the user should be able to present itself in a meaningful way for observation and manipulation.

Interesting observation:

An operating system is a collection of things that don't fit into a language. There shouldn't be one.

🐦 Dan Ingalls

🔗 Design Principles Behind Smalltalk

🧑‍🎨 End-user computing by Adam Wiggins

Experts want choice; newbies want to be handed an integrated product where good choices have been made for them and they can dive straight into their task.

...

Too much focus on the technology (e.g., programming language) and too little focus on the user’s task.

...

Most laypeople don't care about computers; they care about what they can use a computer for

...

Most people will only care about computer programming when it offers them a clear way to accomplish specific goals that are relevant to their lives.

...

These tools all share the same two golden traits: no-fuss setup, and a programming language and development tools focused on the specific tasks their users want to achieve.

🐦 Adam Wiggins

🔗 End-user computing

📱 Collecting my thoughts about notation and user interfaces by Matt Webb

So Lynch’s five primitives comprise a notation.

It’s composable. A small number of simple elements can be combined, according to their own grammar, for more complex descriptions. There’s no cap on complexity; this isn’t paint by numbers. The city map can be infinitely large.

Compositions are shareable. And what’s more, they’re degradable: a partial map still functions as a map; one re-drawn from memory on a whiteboard still carries the gist. So shareable, and pragmatically shareable.

Not only are maps in this notation functional for communication, but it’s possible to look at a sketched city map and deconstruct it into its primitive elements (without knowing Lynch’s system) and see how to use those elements to extend or correct the map, or create a whole new one. So the notation is learnable.

🐦 Matt Webb

🔗 Collecting my thoughts about notation and user interfaces

🧰 Computers are so easy that we've forgotten how to create

Not all jobs will require coding, at least not yet. Rather, what we are going to need – as a society – is a certain amount of computational thinking in this increasingly technological world.

And in this way, computer programming is indeed the future. Programming can teach you a structured way of thinking, but it can also provide a way of recognising what is possible in the technological realm.

...

Why should we have to rely on a priestly class of experts who are the sole inheritors of a permission to play?

...

My dad never intended to sell his games; they were for our family alone. He was a computer user, but he was also a creator.

🔗 Get under the hood

Japan's Fifth Generation Computer Systems: Success or Failure?

This post is a summary of content from papers covering the topic, it's mostly quotes from the papers from 1983, 1993 and 1997 with some edition, references to the present and future depend on the paper but should be easy to deduce. See the Sources section at the end.

Introduction

In 1981, the emergence of the government-industry project in Japan known as Fifth Generation Computer Systems (FGCS) was unexpected and dramatic.

The Ministry of International Trade and Industry (MITI) and some of its scientists at Electrotechnical Laboratory (ETL) planned a project of remarkable scope, projecting both technical daring and major impact upon the economy and society.

This project captured the imagination of the Japanese people (e.g. a book in Japanese by Junichiro Uemae recounting its birth was titled The Japanese Dream).

It also captured the attention of the governments and computer industries of the USA and Europe, who were already wary of Japanese takeovers of important industries.

A book by Feigenbaum and McCorduck, The Fifth Generation, was a widely-read manifestation of this concern.

The Japanese plan was grand but it was unrealistic, and was immediately seen to be so by the MITI planners and ETL scientists who took charge of the project.

A revised planning document was issued in May 1982 that set more realistic objectives for the Fifth Generation Project.

/galleries/post-images/fgcs/conceptual-diagram.png

Previous Four Generations

  • First generation: ENIAC, invented in 1946, and others that used vacuum tubes.

  • Second generation: IBM 1401, introduced in 1959, and others that used transistors.

  • Third generation: IBM S/360, introduced in 1964, and others that used integrated circuits.

  • Fourth generation: IBM E Series, introduced in 1979, and others that used very large-scale integrated circuits, VLSI which have massively increased computational capacity but are still based on the Von Neumann architecture and require specific and precise commands to perform a task.

FGCS was conceived as a computer that can infer from an incomplete instruction, by making use of the knowledge it has accumulated in its database.

FGCS was based on an architecture distinct from that of the previous four generations of computers which had been invented by Von Neumann and commercially developed by IBM among others.

The Vision

  1. Increased intelligence and ease of use so that they will be better able to assist man. Input and output using speech, voice, graphics, images and documents, using everyday language, store knowledge to practical use and to the ability to learn and reason.

  2. To lessen the burden of software generation in order that a high level requirements specification is sufficient for automatic processing, so that program verification is possible thus increasing the reliability of software. Also the programming environment has to be improved while it should also be possible to use existing software assets.

  3. To improve overall functions and performance to meet social needs. The construction of light, compact, high-speed, large capacity computers which are able to meet increased diversification and adaptability, which are highly reliable and offer sophisticated functions.

Objectives

The objective of this project is to realise new computer systems to meet the anticipated requirements of the 1990s.

Everybody will be using computers in daily life without thinking anything of it. For this objective, an environment will have to be created in which a man and a computer find it easy to communicate freely using multiple information media, such as speech, text, and graphs.

The functions of FGCSs may be roughly classified as follows:

  1. Problem-solving and inference

  2. Knowledge-base management

  3. Intelligent interface

The intelligent interface function will have to be capable of handling man/machine communication in natural languages, speeches, graphs, and images so that information can be exchanged in a way natural to a man.

There will also be research into and development of dedicated hardware processors and high-performance interface equipment for efficiently executing processing of speech, graph, and image data.

Several basic application systems will be developed with the intention of demonstrating the usefulness of the FGCS and the system evaluation. These are machine translation systems, consultation systems, intelligent programming systems and an intelligent VLSI-CAD system.

The key technologies for the Fifth Generation Computer System seem to be:

  • VLSI architecture

  • Parallel processing such as data flow control

  • Logic programming

  • Knowledge base based on relational database

  • Applied artificial intelligence and pattern processing

Project Requirements

  1. Realisation of basic mechanisms for inference, association, and learning in hardware, making them the core functions of the Fifth Generation computers.

  2. Preparation of basic artificial intelligence software in order to fully utllise the above functions.

  3. Advantageous use of pattern recognition and artificial intelligence research achievements, in order to realise man/machine interfaces that are natural to man.

  4. Realisation of support systems for resolving the 'software crisis' and enhancing software production.

It will be necessary to develop high performance inference machines capable of serving as core processors that use rules and assertions to process knowledge information.

Existing artificial intelligence technology has been developed to be based primarily on LISP. However, it seems more appropriate to employ a Prolog-like logic programming language as the interface between software and hardware due to the following considerations: the introduction of VLSI technology made possible the implementation of high level functions in hardware; in order to perform parallel processing, it will be necessary to adopt new languages suitable for parallel processing; such languages will have to have a strong affinity with relational data models.

Research and development will be conducted for a parallel processing hardware architecture intended for parallel processing of new knowledge bases, and which is based on a relational database machine that includes a high-performance hierarchical memory system, and a mechanism for parallel relational operations and knowledge operations.

The knowledge base system is expected to be implemented on a relational database machine which has some knowledge base facilities in the Fifth Generation Computer System, because the relational data model has a strong affinity with logic programming.

Relational calculus has a close relation with the first order predicate logic. Relational algebra has the same ability as relational calculus in the description of a query. These are reasons for considering a relational algebra machine as the prime candidate for a knowledge base machine.

Risks

There is no precedent for this innovative and large-scale research and development anywhere in the world. We will therefore be obliged to move toward the target systems through a lengthy process of trial and error, producing many original ideas along the way.

Timeline / Plan

(1982-1984) Initial Stage

During the initial stage, research was conducted on the basic technologies for FGCS. The technologies developed included:

  1. ESP (extended self-contained Prolog), a sequential logic-programming language based on Prolog.

  2. PSI (personal sequential inference machine), the world's first sequential inference computer to incorporate a hardware inference engine.

  3. SIMPOS (sequential inference machine programming and operating system), the world's first logic-programming-language-based operating system written with ESP for the PSI.

  4. GHC (guarded horn clauses), a new parallel-logic language for the implementation of parallel inference.

(1985-1988) Intermediate Stage

During the intermediate stage, research was done on the algorithms needed for implementation of the subsystems that would form the basis of FGCS and on the basic architecture of the new computer.

Furthermore, on the basis of this research, small and medium-sized subsystems were developed. The technologies developed included:

  1. KL1, a logic language for parallel inference.

  2. PIMOS (parallel inference machine operating system), a parallel-machine operating system based on the use of KL1 (kernel horn clauses 1).

  3. KAPPA (knowledge application oriented advanced database and knowledge base management system), a knowledge-base management system capable of handling large amounts of complex knowledge.

  4. MultiPSI, an experimental parallel inference machine consisting of 64 element processors linked together in the form of a two-dimensional lattice.

(1989-1992) Final Stage

During the final stage, the object was to put together a prototype fifth generation computer based on the technologies developed during the two preceding stages. The project team developed a number of additional features including:

  1. PIM (parallel inference machine), a parallel inference computer consisting of 1000 linked element processors.

  2. Improvement of PIMOS.

  3. KAPPA-p, a parallel data-management system. For the knowledge programming system the team also developed.

  4. Interactive interface technology.

  5. Problem-solving programming technology.

  6. Knowledge-base creation technology.

  7. To test the prototype system, the team also carried out research into the integration and application of parallel programming technology.

  8. several application software programs were developed to run on the PIM.

(1993-1994) Wrap Up

The project continued on a more limited scale during 1993 and 1994.

In addition to follow-up research on, say, a new KL1 programming environment (called KL1C) on sequential and parallel UNIX-based machines, many efforts were made to disseminate FGCS technologies, for instance, to distribute free ICOT software and to disclose technical data on the Internet.

Why not a Generation Evolution?

For computers to be employed at numerous application levels in the 1990s, they must evolve from machines centered around numerical computations to machines that can assess the meaning of information and understand the problems to be solved.

Non-numeric data such as sentences, speeches, graphs, and images will be used in tremendous volume compared to numerical data.

Computers are expected to deal with non-numeric data mainly in future applications. However, present computers have much less capability in non-numeric data processing than in numeric data processing.

The key factors leading to the necessity for rethinking the conventional computer design philosophy just described include the following:

  1. Device speeds are approaching the limit imposed by the speed of light.

  2. The emergence of VLSI reduces hardware costs substantially, and an environment permitting the use of as much hardware as is required will shortly be feasible.

  3. To take advantage of the effect of VLSI mass production, it will be necessary to pursue parallel processing.

  4. Current computers have extremely poor performance in basic functions for processing speeches, texts, graphs, images and other nonnumerical data, and for artificial intelligence type processing such as inference, association, and learning.

The research and development targets of the FGCS are such core functions of knowledge information processing as problem-solving and inference systems and knowledge-base systems that cannot be handled within the framework of conventional computer systems.

Results

With the Fourth Conference on Fifth Generation Computer Systems, held June 1-5, 1992 in Tokyo, Japan, an era came to an end.

This section quotes different people analyzing the results, it won't be fully consistent

Since then ten years have passed in which ICOT grew to about 100 researchers and spent about 54 billion Yen, that is some 450 million US$. In these ten years a large variety of machines have been built ranging from the special purpose PSI machine, that is a personal sequential inference machine, to several constellations of processors and memory ranging from 16 to 512 processing elements together forming the PIM family, that is the Parallel Inference Machine.

Overreaction

Some people overreacted and spoke even of a technological war. Today some people again overreact. As they see that their fears have not materialized, they regard the project as a failure.

Evaluation

Scientific:

  • ✅ Hardware: use of parallelism

  • ✅ Software: use of logic programming

  • ✅ Applications

    • ❌ No natural language, no pattern recognition

  • ❌ Break-through in architecture

  • ❌ Break-through in software

Economic:

  • ✅ Impact on Japanese researchers

  • ❌ Impact on Japanese hardware makers

Social:

  • ✅ International scientific reputation

    • ❌ But no solution to social problems in Japan

Positive

  • ICOT has shown the ability of Japan to innovate in computer architectures.

  • The ICOT architectures' peak parallel performance is within the range of the original performance goals.

  • The PIMs represent an opportunity to study tradeoffs in parallel symbolic computing which does not exist elsewhere.

  • KL1 is an interesting model for parallel symbolic computation, but one which is unlikely to capture the imagination of US researchers.

  • PIMOS has interesting ideas on control of distribution and communication which US researchers should evaluate seriously.

  • ICOT has been one of the few research centers pursuing parallel symbolic computations.

  • ICOT has been the only center with a sustained effort in this area.

  • ICOT has shown significant (i.e. nearly linear) acceleration of non regular computations (i.e. those not suitable for data parallelism of vectorized pipelining).

  • ICOT created a positive aura for AI, Knowledge Based Systems, and innovative computer architectures. Some of the best young researchers have entered these fields because of the existence of ICOT.

Negative

  • ICOT has done little to advance the state of knowledge based systems, or Artificial Intelligence per se.

  • ICOT's goals in the area of natural language were either dropped or spun out to EDR.

  • Other areas of advanced man machine interfacing were dropped.

  • Research on Very Large Knowledge bases were substantially dropped.

  • ICOT's efforts have had little to do with commercial application of AI technology. Choice of language was critical.

  • ICOT's architectures have been commercial failures. Required both a switch in programming model and the purchase of cost ineffective hardware.

  • ICOT hardware has lagged behind US hardware innovation (e.g. the MIT Lisp Machine and its descendants and the MIT Connection Machine and its descendants).

  • Application systems of the scale described in the original goals have not been developed (yet).

  • Very little work on knowledge acquisition.

The early documents discuss the management of very large knowledge bases, of large scale natural language understanding and image understanding with a strong emphasis on knowledge acquisition and learning. Each of these directions seems to have been either dropped, relegated to secondary status, absorbed into the work on parallelism or transferred to other research initiatives.

The ICOT work has tended to be a world closed in upon itself. In both the sequential and parallel phases of their research, there has been a new language developed which is only available on the ICOT hardware. Furthermore, the ICOT hardware has been experimental and not cost effective. This has prevented the ICOT technology from having any impact on or enrichment from the practical work.

Changes

It is remarkable how little attention is given to the notion of parallel processing, while this notion turned out to be of such great importance for the whole project.

First, in my opinion the original goal of the FGCS project changed its emphasis from what has been described above as primarily a knowledge information processing system, KIPS, with very strong capabilities in man-machine interaction, such as natural language processing, into the following:

A computer system which is:

  • Easy to use intellectually

  • Fast in solving complex problems

In combining the two ideals:

  • Efficient for the mind

  • Efficient for the machine

The intellectual process of translating a problem into the solution of that problem should be simple. By exploiting sophisticated (parallel processing) techniques the computer should be fast.

Research Impact

Japan has indeed proved that it has the vision to take a lead for the rest of the world.

They acted wisely and offered the results to the international public for free use, thus acting as a leader to the benefit of mankind and not only for its own self-interest.

One of the major results and successes of the FGCS project is its effect on the infrastructure of Japanese research and development in information technology.

The technical achievements of ICOT are impressive. Given the novelty of the approaches, the lack of background, the difficulties to be solved, the amount of work done which has delivered something of interest is purely amazing; this is true in hardware as well as in software.

The fulfillment of the vision, should I say working on the "grand plan" and bringing benefits to the society, is definitely not at the level that some people anticipated when the project was launched. This is not, to me, a surprise at all, i.e. I have never believed that very significant parts of this grand plan could be successfully tackled.

Overall, the project has had a major scientific impact, in furthering knowledge throughout the world of how to build advanced computing systems.

I agree that the international impact of the project was not as large as one hoped for in the beginning. I think all of us who believed in the direction taken by the project, i.e. developing integrated parallel computer systems based on logic programming, hoped that by the end of the 10 years' period the superiority of the logic programming approach will be demonstrated beyond doubt, and that commercial applications of this technology will be well on their way. Unfortunately, this has not been the case. Although ICOT has reached its technological goals, the applications it has developed were sufficient to demonstrate the practicality of the approach, but not its conclusive superiority.

Lessons Learned

  1. Be aware that government-supported industrial consortia may not be able to 'read the market', particularly over the long term.

  2. Do not confuse basic research and advanced development.

  3. Expect negative results but hope for positive. Mid-course corrections are a good thing.

  4. Have vision. The vision is critical: people need a big dream to make it worthwhile to get up in the morning.

Logic Programming

It certainly provided a tremendous boost to research in logic programming.

I was expecting however to see 'actual use' of some of the technology at the end of the project. There are three ways in which this could have happened.

The first way would have been to have real world applications, in user terms; only little of that can be seen at this stage, even though the efforts to develop demonstrations are not be underestimated.

The second would have been to the benefit of computer systems themselves. This does not appear to be directly happening, at least not now and this is disappointing if only because the Japanese manufacturers have been involved in the FGCS project, at least as providers of human resources and as subcontractors.

The third way would have been to impact computer science outside of the direct field in which this research takes place: for example to impact AI, to impact software engineering, etc.; not a lot can yet be seen, but there are promising signs.

I am genuinely impressed by the scientific achievements of this remarkable project. For the first time in our field, there is a uniform approach to both hardware and software design through a single language, viz. KL1.

It is nearly unbelievable how much software was produced in about two and a half years written directly or indirectly in KL1.

There are at least three aspects to what has been achieved in KLI:

First the language itself is an interesting parallel programming language. KL1 bridges the abstraction gap between parallel hardware and knowledge based application programs. Also it is a language designed to support symbolic (as opposed to strictly numeric) parallel processing. It is an extended logic programming language which includes features needed for realistic programming (such as arrays).

However, it should also be pointed out that like many other logic programming languages, KL1 will seem awkward to some and impoverished to others.

Second is the development of a body of optimization technology for such languages. Efficient implementation of a language such as KL1 required a whole new body of compiler optimization technology.

The third achievement is noticing where hardware can play a significant role in supporting the language implementation.

By Companies

The main Companies involved in the project were Fujitsu, Hitachi, Mitsubishi Electric, NEC, Oki, Toshiba, Matsushita Electric Industrial and Sharp.

Almost all companies we interviewed said that ICOT's work had little direct relevance to them.

The reasons most frequently cited were: The high cost of the ICOT hardware, the choice of Prolog as a language and the concentration on parallelism.

However, nearly as often our hosts cited the indirect effect of ICOT: the establishment of a national project with a focus on 'fifth generation technology' had attracted a great deal of attention for Artificial Intelligence and knowledge based technology.

Several sites commented on the fact that this had attracted better people into the field and lent an aura of respectability to what had been previously regarded as esoteric.

Hardware

During the first 3 year phase of the project, the Personal Sequential Inference machine (PSI 1) was built and a reasonably rich programming environment was developed for it.

Like the MIT machine, PSI was a microprogrammed processor designed to support a symbolic processing language. The symbolic processing language played the role of a broad spectrum 'Kernel language' for the machine, spanning the range from low level operating system details up to application software. The hardware and its microcode were designed to execute the kernel language with high efficiency. The machine was a reasonably high performance work station with good graphics, networking and a sophisticated programming environment. What made PSI different was the choice of language family. Unlike more conventional machines which are oriented toward numeric processing or the MIT machine which was oriented towards LISP the language chosen for PSI was Prolog.

The choice of a logic programming framework for the kernel language was a radical one since there had been essentially no experience anywhere with using logic programming as a framework for the implementation of core system functions.

Several hundred PSI machines were built and installed at ICOT and collaborating facilities; and the machine was also sold commercially. However, even compared to specialized Lisp hardware in the US, the PSI machines were impractically expensive. The PSI (and other ICOT) machines had many features whose purpose was to support experimentation and whose cost/benefit tradeoff had not been evaluated as part of the design; the machines were inherently non-commercial.

The second 3 year phase saw the development of the PSI 2 machine which provided a significant speedup over PSI 1. Towards the end of Phase 2 a parallel machine (the Multi-PSI) was constructed to allow experimentation with the FGHC paradigm. This consisted of an 8 × 8 mesh of PSI 2 processors, running the ICOT Flat Guarded Horn Clause language KL1.

The abstract model of all PIMs consists of a loosely coupled network connecting clusters of tightly coupled processors. Each cluster is, in effect, a shared memory multiprocessor; the processors in the cluster share a memory bus and implement a cache coherency protocol. Three of the PIMs are massively parallel machines.

Multi-Psi is a medium scale machine built by connecting 64 Psi's in a mesh architecture.

Even granting that special architectural features of the PIM processor chips may lead to a significant speedup (say a factor of 3 to be very generous), these chips are disappointing compared to the commercial state of the art.

Specialized Hardware

Another most important issue, of a completely different nature, is the question of whether ICOT was wise to concentrate so much effort on building specialised hardware for logic programming, as opposed to building, or using off the shelf, more general purpose hardware not targeted at any particular language or programming paradigm. The problems with designing specialised experimental hardware is that any performance advantage that can be gained is likely to be rapidly overtaken by the ever-continuing rapid advance of commercially available machines, both sequential and parallel. ICOT's PSI machines are now equalled if not bettered for Prolog and CCL performance by advanced RISC processors.

Many are skeptical about the need for special purpose processors and language dedicated machines. The LISP machines failed because LISP was as fast, or nearly as fast, implemented via a good compiler on a general purpose machine. The PSI machines surely do not have a market because the latest Prolog compilers, compiling down to RISC instructions and using abstract interpretation to help optimize the code, deliver comparable performance.

It is interesting to compare the PIMs to Thinking Machines Inc.'s CM-5; this is a massively parallel machine which is a descendant of the MIT Connection Machine project. The CM-5 is the third commercial machine in this line of development.

Although the Connection Machine project and ICOT started at about the same time, the CM-5 is commercially available and has found a market within which it is cost effective.

Demo Applications

I think that this was the result of the applications being developed in an artificial set-up. I believe applications should be developed by people who need them, and in the context where they are needed.

In general, I believe that too little emphasis was placed on building the best versions of applications on the machines (as opposed to demonstration versions).

In a nutshell, the following has been achieved: for a number of complicated applications in quite diverse areas, ranging from Molecular Biology to Law it has been shown that it is indeed possible to exploit the techniques of (adapted) logic programming, LP, to formulate the problems and to use the FGCS machines to solve them in a scalable way; that is parallelism could indeed profitably be used.

The demonstrations involved:

  1. A Diagnostic and control expert system based on a plant model

  2. Experimental adaptive model-based diagnostic system

  3. Case-based circuit design support system

  4. Co-HLEX: Experimental parallel hierarchical recursive layout system

  5. Parallel cell placement experimental system

  6. High level synthesis system

  7. Co-LODEX: A cooperative logic design expert system

  8. Parallel LSI router

  9. Parallel logic simulator

  10. Protein sequence analysis program

  11. Model generation theorem prover: MGTP

  12. Parallel database management system: Kappa-P

  13. Knowledge representation language: QUIXOTE

  14. A parallel legal reasoning system: HELIC-II

  15. Experimental motif extraction system

  16. MENDELS ZONE: A concurrent program development system

  17. Parallel constraint logic programming system: GDCC

  18. Experimental system for argument text generation: Dulcinea

  19. A parallel cooperative natural language processing system: Laputa

  20. An experimental discourse structure analyzers

They have some real success on a technical level, but haven't produced applications that will make a difference (on the world market).

Programming Languages

Two extended Prolog-like languages (ESP and KL0) were developed for PSI-1. ESP (Extended Self Contained Prolog) included a variety of features such as coroutining constructs, non-local cuts, etc. necessary to support system programming tasks as well as more advanced Logic Programming. SIMPOS, the operating system for the PSI machines, was written in ESP.

Phase 3 has been centered around the refinement of the KL1 model and the development of massively parallel hardware systems to execute it.

KL1 had been refined into a three level language:

  • KLI-b is the machine level language underlying the other layers

  • KLI-c is the core language used to write most software; it extends the basic FGHC paradigm with a variety of useful features such as a macro language

  • KLI-p includes the 'pragmas' for controlling the implementation of the parallelism

Much of the current software is written in higher level languages embedded in KL1, particularly languages which establish an object orientation. Two such languages have been designed: A'UM and AYA. Objects are modeled as processes communicating with one another through message streams.

Two languages of this type developed at ICOT are CAL (Constraint Avec Logique) which is a sequential constraint logic programming language which includes algebraic, Boolean, set and linear constraint solvers.

A second language, GDCC (Guarded Definite Clauses with Constraints) is a parallel constraint logic programming language with algebraic, Boolean, linear and integer parallel constraint solvers.

Prolog vs LISP

Achieving such revolutionary goals would seem to require revolutionary techniques. Conventional programming languages, particularly those common in the late 1970s and early 1980s offered little leverage.

The requirements clearly suggested the use of a rich, symbolic programming language capable of supporting a broad spectrum of programming styles.

Two candidates existed: LISP which was the mainstream language of the US Artificial Intelligence community and Prolog which had a dedicated following in Europe.

LISP had been used extensively as a systems programming language and had a tradition of carrying with it a featureful programming environment; it also had already become a large and somewhat messy system. Prolog, in contrast, was small and clean, but lacked any experience as an implementation language for operating systems or programming environments.

OS

Multi-PSI supported the development of the ICOT parallel operating system (PIMOS) and some initial small scale parallel application development. PIMOS is a parallel operating system written in KL1; it provides parallel garbage collection algorithms, algorithms to control task distribution and communication, a parallel file system, etc.

AI

Interest in AI (artificial intelligence) boomed around the time and companies started to realize the potential value of FGCS research as a complement to their own AI research.

Databases

In the area of databases, ICOT has developed a parallel database system called Kappa-P. This is a 'nested relational' database system based on an earlier ICOT system called Kappa. Kappa-P is a parallel version of Kappa, re-implemented in KL1.

It also adopts a distributed database framework to take advantage of the ability of the PIM machines to attach disk drives to many of the processing elements. Quixote is a Knowledge Representation language built on top of Kappa-P.

It is a constraint logic programming language with object-orientation features such as object identity, complex objects described by the decomposition into attributes and values, encapsulation, type hierarchy and methods. ICOT also describes Quixote as a Deductive Object Oriented Database (DOOD).

Theorem Proving

Another area explored is that of Automatic Theorem Proving. ICOT has developed a parallel theorem prover called MGTP (Model Generation Theorem Prover).

Fun Trivia

The one commercial use we saw of the PSI machines was at Japan Air Lines, where the PSI-II machines were employed; ironically, they were remicrocoded as Lisp Machines.

Sources

Conclusion

This section has no quotes from sources, these are my observations:

I find that for every thing you can point as a reason for its "failure" there's an alternative universe where you could point to the same things as reasons for its success.

For example:

  • Starting from scratch

  • Thinking from first principles

  • Radical changes to the status quo

  • Single focus

  • Vertical integration

  • Risky bet on nascent technologies

  • Specialized Hardware

The only reason that is clearly negative is that the demo applications were not developed with real use cases and involving real users, that may have made it harder to show the value of FGCS to end users and industrial partners.

It would be easy to mention Worse is Better but if it keeps coming up as a reason, maybe we should pay more attention to it?

My conclusion right now is that technically they achieved many of the things they planned, they succeeded at going where they thought the puck was going to be in the 90s but it ended up somewhere else.

What's your conclusion?

Past Futures of Programming: General Magic's Telescript

Introduction

Telescript was a programming language developed by General Magic in the nineties that allow the first generation of mobile devices to interact with services in a network.

This sounds similar to the way smartphones work today, but the paradigm that Telescript supported called "Remote Programming" instead of Remote Procedure Calling is really different to the way we build services and mobile applications today.

For this reason and because as far as I know there's not much knowledge about the language and the paradign online I decided to write a summary after reading all the content I could find. All resources are linked at the end of the article.

If you haven't heard of General Magic I highly recommend you to watch the documentary, here's the trailer:

In case you prefer content in video form, the following may give you an idea.

For an overview video and the earliest mention of the Cloud I can think of see:

An introduction by Andy Hertzfeld at 21:20 in this video:

Another (longer) talk by Andy at Standford two weeks after the one above, mostly focused on Magic Cap but mentions Telescript around 1:06:38:

From now on most of the text is quoted from the resources linked at the end, since my personal notes are just a few I will quote only my comments, they will look like this:

Hi, this is a comment from the author and not a quote from Telescript resources.

Since each resource attempts to be self contained there's a lot of content that is repeated with some variation.

I slightly edited the quoted text to avoid repetition. Emphasis is mine.

The Pitch

The convergence of computers and communication, and advances in graphical user interfaces are placing powerful new electronics products in the hands of consumers.

In principle, such products can put people in closer touch with one another -- for example, by means of electronic postcards; simplify their relationships by helping them make and keep appointments; provide them with useful information such as television schedules, traffic conditions, restaurant menus, and stock market results; and help them carry out financial transactions, from booking theater tickets, to ordering flowers, to buying and selling stock.

Unless public networks become platforms on which third-party developers can build communicating applications, the networks will respond much too slowly to new and varied requirements and so will languish. Unfortunately, today's networks are not platforms.

Telescript enables the creation of a new breed of network that supports the development of communicating applications by making the network a platform for developers. It provides the "rules of the road" for the information superhighway, which leads to the electronic marketplace.

The Electronic Marketplace

Telescript integrates an electronic world of computers and the networks that link them. This world is filled with Telescript places occupied by Telescript agents.

In the electronic world, each place or agent represents an individual or organization in the physical world, its authority. A place's or agent's authority is revealed by its telename, which can't be falsified.

A place, but not an agent, has a teleaddress, which designates the place's location in this electronic world and reveals the authority of the individual or organization operating the computer in which the place is housed.

The typical place is permanently occupied by an agent of the place's authority and temporarily occupied –visited– by agents of other authorities.

The Plan

In July 1995, NTT, AT&T, and Sony announced a joint venture to deploy a Telescript-based service in Japan.

In October 1995, France Telecom (the operater of the Minitel electronic marketplace, which supports more than 26,000 merchants and accessed in 1994 by 18 million users) announced its licensing of Telescript for use in France.

The Language

Telescript is:

  • Object-oriented: As in SmallTalk

  • Complete: Any algorithm can be expressed in the language.

  • Dynamic: A program can define or discover, and then use new classes during execution. While exploring the electronic marketplace, a Telescript agent may encounter an object whose class it has never seen. The agent can take the object home, where it continues to function.

  • Persistent. The Telescript engine secures all its data transparently, even the program counter that marks its point of execution. Thus, a Telescript program persists even in the face of computer failures.

  • Interpreted

  • Portable and safe: A computer executes an agent's instructions through a Telescript engine, not directly. An agent can execute in any computer in which an engine is installed, yet it cannot access directly its processor, memory, file system, or peripheral devices.

  • Communication-centric: Designed for carrying out complex networking tasks: navigation, transportation, authentication, access control, and so on.

Telescript supplements systems programming languages such as C and C++; it does not replace them. Only the parts of an application that require the ability to move from one place in a network to another –the agents– are written in Telescript.

Telescript is object-oriented. In fact, like the first object-oriented language, Smalltalk, Telescript is deeply object-oriented. That is, every piece of information, no matter how small, is an object. Thus, even a Boolean is an object.

Like many object-oriented programming languages, Telescript focuses on classes. A class is a "slice" of an object's interface, combined with a related "slice" of its implementation. An object is an instance of a class.

Standard OOP similar to Smalltalk/Java, check Telescript Object Model for details

Process objects provide Telescipt's multi-tasking functionality. Processes are pre-emptively multi-tasked, and scheduled according to priority.

Telescript implements the following principal concepts:

  • Places

  • Agents

  • Travel

  • Meetings

  • Connections

  • Authorities

  • Permits

Telescript extends the concept of remote programming, the ability to upload and execute programs to a remote processor, with migrating processes. A Telescript mobile agent is a migrating process that is able to move autonomously during its execution to a different processor and continue executing there.

Mobile agents conceptually move the client to the server, where stationary processes, or places, service their requests. When it is done in a place, an agent might choose to move itself to a different processor, carry results back to where it originated, or simply terminate.

Clearly, security is a major concern in this scenario. The operator of a Telescript processor wants some assurance that nothing bad will come of its decision to admit an incoming agent. The host platform wants to know who is responsible for the agent. The agent, on the other hand, would like to trust that private information it is carrying will not be disclosed arbitrarily. It needs to trust the operator of the platform.

Telescript provides some useful Mix-in classes that associate security-relevant attributes with objects of those classes, where the associated functionality is enforced by the engine. These include:

  • Unmoved. An agent cannot take such an object along with it when it does a go. Places, for example, are Unmoved.

  • Uncopied. An attempt to make a copy of such an object returns a reference to the original object rather than creating a copy.

  • Copyrighted. This class is provided as a language extension rather than part of the language. Nonetheless, it is built into engines. An attempt to instantiate such an object will fail during initialization if it is not properly authorized by a suitable Copyright Enforcer object.

  • Protected. Such an object cannot be modified once created, and any reference to such an object is like having a protected reference, except that ownership can be transferred. Packages are Protected.

Unauthorized processes, or processes that are not running under the region's authority, cannot create instances of the following classes:

  • File: A File object can create a handle to any file that the engine can access on the underlying operating system.

  • External Handle: An External Handle object can open a TCP/IP port on the underlying operating system.

  • Control Manager: A Control Manager object can be used to perform a number of management and control operations on an engine. For example, a Control Manager can be used to change attributes of processes, such as their authority, or to halt the engine.

The Current Approach: Remote Procedure Calling

Today networking is based upon remote procedure calling (RPC). A network carries messages -- data -- that either request services or respond to such requests. The sending and receiving computers agree in advance upon the messages they will exchange. Such agreements constitute a protocol.

A client computer with work for a server computer to accomplish orchestrates the work with a series of remote procedure calls. Each call comprises a request, sent from client to server, and a follow-up response, sent from server to client.

The New Approach: Remote Programming

A different approach to networking is remote programming (RP). The network carries objects -- data and procedures -- that the receiving computer executes.

The two computers agree in advance upon the instructions from which procedures are formed. Such agreements constitute a language.

A client computer with work for a server computer to accomplish orchestrates the work by sending to the server an agent whose procedure makes the required requests (e.g., "delete") using the data (e.g., "two weeks"). Deleting the old files -- no matter how many -- requires just the message that transports the agent between computers. All of the orchestration, including the analysis deciding which files are old enough to delete, is done "on-site" at the server.

The salient characteristic of remote programming is that client and server computers can interact without the network's help once it has transported an agent between them. Thus, interaction does not require ongoing communication.

The opportunity for remote programming is bidirectional. Server computers, like client computers, can have agents, and a server agent can transport itself to and from a client computer. Imagine, for example, that a client computer makes its graphical user interface accessible to server agents. The client computer might do this, for example, by accepting a form from a server agent, displaying the form to the user, letting the user fill it out, and returning the completed form to the agent. The completed form might instruct a file server's agent to retrieve files meeting specified criteria.

Remote programming is especially well suited to computers that are connected to a network not permanently, but rather only occasionally.

With agents, manufacturers of client software can extend the functionality offered by manufacturers of server software.

Introducing a new rpc-based application requires a business decision on the part of the service provider.

A network using remote programming requires a buying decision on the part of one user. Remote programming therefore makes a network, like a personal computer, a platform.

The Engine

All Telescript engines provide:

  • Runtime type checking with dynamic feature binding

  • Automatic memory management

  • Exception processing

  • Authenticated, unforgeable identity for each process, in the form of an authority

  • Protected references

  • Protection by encapsulation of private properties and features. This forms the basis for object-enforced access controls

  • quotas and process privileges using permits, including control over creation of new processes

  • security-oriented mix-in classes (Copyrighted, Unmoved, Protected, Uncopied)

  • mediated protocols for process rendezvous (for example, entering a place, and Meeting Agents)

With regard to installing bogus classes, the Telescript engine won't admit an agent carrying a class that has the same name as one that's already in the engine, unless it's the same class. In other words, within an engine, class names must be unique.

Concepts

Agents

/images/post/telescript/telescript1.png

A place is occupied by Telescript agents. Whereas places give the electronic marketplace its static structure, agents are responsible for its dynamic activity.

A Telescript agent is an independent process. The Telescript environment executes the programs of the various agents that occupy the marketplace in parallel with one another.

Two agents must meet before they can interact. One agent initiates the meeting using meet, an instruction in the Telescript instruction set. The second agent, if present, accepts or declines the meeting.

As a consequence of meet, the agents receive references to one another. The references let them interact as peers.

While in the same place, two agents interact by meeting. While in different places, they interact by communicating.

An agent can travel to several places in succession. It might link trips in this way to obtain several services or to select one from among them. Booking theater tickets, for example, might be only the first task on our user agent's to-do list. The second might be to travel to the florist place and there arrange for a dozen roses to be delivered the day of the theater event.

Places

/images/post/telescript/telescript3.png

Telescript places lend structure and consistency to the electronic marketplace.

Each place represents, in the electronic world, an individual or organization -- the place's authority -- in the physical world. Several places may have the same authority. A place's authority is revealed by its telename.

Travel

/images/post/telescript/telescript2.png

Agents travel using Telescript's go instruction.

The agent need merely present a ticket that identifies its destination. An agent executes go to get from one place to another. After an agent executes go, the next instruction in the agent's program is executed at the agent's destination, not at its source. Thus, Telescript reduces networking to a single program instruction.

If the trip cannot be made (for example, because the means of travel cannot be provided or the trip takes too long), the go instruction fails and the agent handles the exception as it sees fit. However, if the trip succeeds, the agent finds that its next instruction is executed at its destination.

An agent can move from place to place throughout the performance of its procedure because the procedure is written in a language designed to permit this movement.

Meetings

A meeting lets agents in the same computer call one another's procedures.

Another instruction available to the Telescript programmer is meet, which enables one agent to meet another. The agent presents a petition, which identifies the agent to be met. An agent executes meet whenever it wants assistance. By meeting, the agents receive references to one another that enable them to interact as peers.

The instruction requires a petition, data that specify the agent to be met and the other terms of the meeting, such as the time by which it must begin. If the meeting cannot be arranged (for example, because the agent to be met declines the meeting or arrives too late), the meet instruction fails and the agent handles the exception as it sees fit. However, if the meeting occurs, the two agents are placed in programmatic contact with one another.

Connections

Telescript lets two agents in different places make a connection between them. A connection lets agents in different computers communicate.

Connections are often made for the benefit of human users of interactive applications. The agent that travels in search of theater tickets, for example, might send to an agent at home a diagram of the theater showing the seats available. The agent at home might present the floor plan to the user and send to the agent on the road the locations of the seats the user selects.

Permits

Every Telescript place or agent has a permit that limits its capabilities in the electronic marketplace.

Because agents move, their permits, like their credentials, are of special concern. An agent's permit is established when the agent is created programmatically, and it is renegotiated whenever the agent travels between regions. The destination region may deny the agent capabilities that it received at birth as long as the agent is in that region.

Two kinds of capability are granted an agent by its permit. One kind is the right to use a certain Telescript instruction.

Another kind of capability is the right to use a particular Telescript resource, but only in a certain amount. An agent is granted, among other things, a maximum lifetime, measured in seconds (e.g., a 5-minute agent); a maximum size, measured in bytes (e.g., a 1K agent); and a maximum overall expenditure of resources, the agent's allowance, measured in teleclicks (e.g., a 50¢agent)

Permits provide a mechanism for limiting resource consumption and controlling the capabilities of executing code. A permit is an object (of the built-in class Permit) whose attributes include, among others:

  • age: maximum age in seconds

  • extent: maximum size in octets

  • priority: maximum priority

  • canCreate: true if new processes can be created

  • canGo: true if the affected code can request the go operation

  • canGrant: true if the permit of other processes can be "increased"

  • canDeny: true if the permit of other processes can be "decreased"

Telescript uses four kinds of permits:

  • native permits are assigned by the process creator

  • local permits can be imposed by a place on an entering agent or on a process created in that place. Local permits only apply in that place

  • regional permits are like local permits but imposed by the engine place. Regional permits only apply within a particular engine or set of engines comprising a region

  • temporary permits, which are imposed on a block of code using the Telescript restrict statement

Authorities

Agents and places can discern but neither withhold nor falsify their authorities. Anonymity is precluded.

Telescript verifies the authority of an agent whenever it travels from one region of the network to another. A region is a collection of places provided by computers that are all operated by the same authority.

To determine an agent's or place's authority, an agent or place executes the Telescript's name instruction.

The result of the instruction is a telename, data that denote the entity's identity as well as its authority. Identities distinguish agents or places of the same authority.

A place can discern the authority of any agent that attempts to enter it and can arrange to admit only agents of certain authorities.

An agent can discern the authority of any place it visits and can arrange to visit only places of certain authorities.

An agent can discern the authority of any agent with which it meets or to which it connects and can arrange to meet with or connect to only agents of certain authorities.

Telescript provides different ways for identifying the authority and class of the caller (i.e., the requester, or client, from the point of view of the executing code) that are useful in making identity-based access checks. These are obtained directly from the engine in global variables, and include:

  • The current process. An unprotected reference to the process that owns the computation thread that requested the operation.

  • The current owner. An unprotected reference to the process that will own (retain owner references to) any objects created. The owner is usually the current process, but can be temporarily changed, for code executed within an own block, to one's own owner. That is, to the owner of the object that actually supplies the code being executed.

  • The current sponsor. An unprotected reference to the process whose authority will get attached to new process objects, and who will be charged for them. Processes own themselves, so for new process creation, it doesn't make any difference who the current owner is. The engine uses the authority (and permit) of the current sponsor, usually the current process, to determine responsibility for new agents and places.

  • The client. This is the object whose code requested the current operation. The client's owner might be yet another process.

    See An Introduction to Safety and Security in Telescript: Encapsulation and Access Control to see why 4 identities are needed

Protocols

Telescript protocols operate at two levels. The higher level encompasses the encoding (and decoding) of agents, the lower level their transport.

The Telescript encoding rules explain how an agent -- its program, data, and execution state -- are encoded for transport, and how parts of the agent sometimes are omitted as a performance optimization.

The protocol suite can operate over a wide variety of transport networks, including those based on the TCP/IP protocols of the Internet, the X.25 interface of the telephone companies, or even electronic mail.

Example Use Cases

Electronic Mail

An important enterprise in the electronic marketplace is the electronic postal system, which can be composed of any number of interconnected post offices.

Telescript is a perfect vehicle for the implementation of electronic mail systems.

Following the remote programming paradigm, messages, since they are mobile, are implemented as agents. Mailboxes, since they are stationary, are implemented as places. Each mailbox is occupied by an agent of the mailbox's authority. A message's delivery is a transaction between two agents: the sender's message and the agent attending the recipient's mailbox. The transaction transfers the message's content between the two.

A message is customized by the sender, a mailbox by the receiver.

Booking a Round-trip

Chris can thank one Telescript agent for booking his round-trip flight to Boston, another for monitoring his return flight and notifying him of its delay.

Read the details on How Agents Provide the Experience

Buying a Camera

John is in the market for a camera. He's read the equipment reviews in the photography magazines and Consumer Reports and he's visited the local camera store. He's buying a Canon EOS A2. The only question that remains is, from whom? John poses that question to his personal communicator. In 15 minutes, John has the names, addresses, and phone numbers of the three shops in his area with the lowest prices.

Read the details on Doing Time-Consuming Legwork

Planning an Evening

Mary and Paul have been seeing each other for years. Both lead busy lives. They don't have enough time together. But Mary has seen to it that they're more likely than not to spend Friday evenings together. She's arranged -- using her personal communicator -- that a romantic comedy is selected and ready for viewing on her television each Friday at 7 p.m., that pizza for two is delivered to her door at the same time, and that she and Paul are reminded, earlier in the day, of their evening together and of the movie to be screened for them.

Paul and Mary recognize the need to live well-rounded lives, but their demanding jobs make it difficult. Their personal communicators help them achieve their personal, as well as their professional, objectives. And it's fun.

Read the details on Using Services in Combination

Example Code

A shopping agent, acting for a client, travels to a warehouse place, checks the price of a product of interest to its client, waits if necessary for the price to fall to a client-specified level, and returns, either when the price is at that level or after a client-specified period of time.

CatalogEntry: class = (
  public
     see initialize
     see adjustPrice
     product: String;
     price: Integer; // cents
  property
     lock: Resource;
);

initialize: op (product: String; price: Integer) = {
  ^();
  lock = Resource()
};

adjustPrice: op (percentage: Integer) throws ReferenceProtected = {
  use lock   {
    price = price + (price*percentage).quotient(100)
  }
};

Warehouse: class (Place, EventProcess) = (
  public
    see initialize
    see live
    see getCatalog
  property
    catalog: Dictionary[String, CatalogEntry];
);

initialize: op (catalog: owned Dictionary[String, CatalogEntry]) = {
  ^()
};

live: sponsored op (cause: Exception|Nil) = {
  loop {
    // await the first day of the month
    time: = Time();
    calendarTime: = time.asCalendarTime();
    calendarTime.month = calendarTime.month + 1;
    calendarTime.day = 1;
    *.wait(calendarTime.asTime().interval(time));

    // reduce all prices by 5%
    for product: String in catalog {
      try { catalog[product].adjustPrice(-5) }
      catch KeyInvalid { }
    };

  // make known the price reductions
  *.signalEvent(PriceReduction(), 'occupants)
  }
};

.

See The Electronic Shopper for more code

Resources

📚 Instadeq Reading List: May 2021

Here is a list of content we found interesting this month.

  • ✍️ Notation as a Tool of Thought

  • 💭 How can we develop transformative tools for thought?

  • 👩‍🎨 Designerly Ways of Knowing: Design Discipline Versus Design Science

✍️ Notation as a Tool of Thought

  • The importance of nomenclature, notation, and language as tools of thought has long been recognized. In chemistry and in botany the establishment of systems of nomenclature did much to stimulate and to channel later investigation

  • Mathematical notation provides perhaps the best-known and best-developed example of language used consciously as a tool of thought

  • In addition to the executability and universality emphasized in the introduction, a good notation should embody characteristics familiar to any user of mathematical notation:

  • Ease of Expressing Constructs Arising in Problems:

    If it is to be effective as a tool of thought, a notation must allow convenient expression not only of notions arising directly from a problem, but also of those arising in subsequent analysis, generalization, and specialization.

  • Suggestivity:

    A notation will be said to be suggestive if the forms of the expressions arising in one set of problems suggest related expressions which find application in other problems.

  • Subordination of Detail:

    As Babbage remarked in the passage cited by Cajori, brevity facilitates reasoning. Brevity is achieved by subordinating detail

  • Economy:

    The utility of a language as a tool of thought increases with the range of topics it can treat, but decreases with the amount of vocabulary and the complexity of grammatical rules which the user must keep in mind. Economy of notation is therefore important.

    Economy requires that a large number of ideas be expressible in terms of a relatively small vocabulary. A fundamental scheme for achieving this is the introduction of grammatical rules by which meaningful phrases and sentences can be constructed by combining elements of the vocabulary.

💭 How can we develop transformative tools for thought?

  • Retrospectively it’s difficult not to be disappointed, to feel that computers have not yet been nearly as transformative as far older tools for thought, such as language and writing. Today, it’s common in technology circles to pay lip service to the pioneering dreams of the past. But nostalgia aside there is little determined effort to pursue the vision of transformative new tools for thought

  • Why is it that the technology industry has made comparatively little effort developing this vision of transformative tools for thought?

  • Online there is much well-deserved veneration for these people. But such veneration can veer into an unhealthy reverence for the good old days, a belief that giants once roamed the earth, and today’s work is lesser

  • What creative steps would be needed to invent Hindu-Arabic numerals, starting from the Roman numerals? Is there a creative practice in which such steps would be likely to occur?

  • The most powerful tools for thought express deep insights into the underlying subject matter

  • Conventional tech industry product practice will not produce deep enough subject matter insights to create transformative tools for thought

  • The aspiration is for any team serious about making transformative tools for thought. It’s to create a culture that combines the best parts of modern product practice with the best parts of the (very different) modern research culture. Diagram: 'insight' and 'making', pointing to each other in a loop You need the insight-through-making loop to operate, whereby deep, original insights about the subject feed back to change and improve the system, and changes to the system result in deep, original insights about the subject.

    People with expertise on one side of the loop often have trouble perceiving (much less understanding and participating in) the nature of the work that goes on on the other side of the loop. You have researchers, brilliant in their domain, who think of making as something essentially trivial, “just a matter of implementation”. And you have makers who don’t understand research at all, who see it as merely a rather slow and dysfunctional (and unprofitable) making process

  • Why isn’t there more work on tools for thought today?

  • It is, for instance, common to hear technologists allude to Steve Jobs’s metaphor of computers as “bicycles for the mind”. But in practice it’s rarely more than lip service. Many pioneers of computing have been deeply disappointed in the limited use of computers as tools to improve human cognition

    Our experience is that many of today’s technology leaders genuinely venerate Engelbart, Kay, and their colleagues. Many even feel that computers have huge potential as tools for improving human thinking. But they don’t see how to build good businesses around developing new tools for thought. And without such business opportunities, work languishes.

  • What makes it difficult to build companies that develop tools for thought?

  • Many tools for thought are public goods. They often cost a lot to develop initially, but it’s easy for others to duplicate and improve on them, free riding on the initial investment. While such duplication and improvement is good for our society as a whole, it’s bad for the companies that make that initial investment

  • Pioneers such as Alan Turing and Alonzo Church were exploring extremely basic and fundamental (and seemingly esoteric) questions about logic, mathematics, and the nature of what is provable. Out of those explorations the idea of a computer emerged, after many years; it was a discovered concept, not a goal. Fundamental, open-ended questions seem to be at least as good a source of breakthroughs as goals, no matter how ambitious

  • There’s a lot of work on tools for thought that takes the form of toys, or “educational” environments. Tools for writing that aren’t used by actual writers. Tools for mathematics that aren’t used by actual mathematicians. Even though the creators of such tools have good intentions, it’s difficult not to be suspicious of this pattern. It’s very easy to slip into a cargo cult mode, doing work that seems (say) mathematical, but which actually avoids engagement with the heart of the subject

  • Good tools for thought arise mostly as a byproduct of doing original work on serious problems

👩‍🎨 Designerly Ways of Knowing: Design Discipline Versus Design Science

  • A desire to “scientise” design can be traced back to ideas in the twentieth century modern movement of design

  • We see a desire to produce works of art and design based on objectivity and rationality, that is, on the values of science

  • The 1960s was heralded as the “ design science decade” by the radical technologist Buckminster Fuller, who called for a “ design science revolution” based on science, technology, and rationalism to overcome the human and environmental problems that he believed could not be solved by politics and economics

  • We must avoid swamping our design research with different cultures imported either from the sciences or the arts. This does not mean that we should completely ignore these other cultures. On the contrary, they have much stronger histories of inquiry, scholarship, and research than we have in design. We need to draw upon those histories and traditions where appropriate, while building our own intellectual culture, acceptable and defensible in the world on its own terms. We have to be able to demonstrate that standards of rigor in our intellectual culture at least match those of the others