All Things Simulation and ExtendSim: Imagine That

Showing posts with label Imagine That. Show all posts

Thursday, July 2, 2020

Race Conditions

written by Dave Krahl, QMT Group

In computer science and engineering a race condition is "an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly". In discrete event simulation programs such as ExtendSim, a race condition is when multiple events happen at the same simulated time and the order of these events has an impact on the operation of the model. Because discrete event simulators process events at discrete times, race conditions are fairly common. There is no standard solution to this, and it is handled differently by the various simulation software programs. When an event occurs in a discrete event simulation model any number of actions can be triggered. The order of these actions can have a significant effect on the behavior of the model. A classic example is if a resource is released and the item (or entity) releasing that resource requests the same resource again and other items are waiting for that resource. Is the releasing item in contention for the resource?

ExtendSim has a number of features that provide additional control over the sequence and scheduling of events as well as the transmission of messages. These include:

Scheduling a 0 time event in an equation before it is evaluated. This provides the ability to delay the calculation of the equation and any subsequent messages sent by the equation returning control to the block that initiated the calculation. This can be implemented in custom blocks as well.
Detailed control of messages in the equation and custom blocks. Whether or not a connector responds to a message is a user-defined option.
In custom blocks, there are message handlers for every type of interaction. These can be used to control how other messages are sent out.

While there are numerous tools for solving race condition problems, the biggest challenge is detecting them in the first place. This often requires detailed inspection of the results of an event in the simulation. Some tools available for this in ExtendSim are:

Tracing of simulation execution
Enabling debugging code and in particular examining the "stack" of function calls and message handlers
Animation of item movement
Record Message block in the ExtendSim Utilities library shows sequences of messages
History block that shows item movement and properties.

Race conditions are an artifact of discrete event simulation as a technology. This issue is something that every experienced simulation modeler has encountered in their modeling experience. Identifying the race condition can be challenging. However, tools exist in simulation programs to address any race condition problems and control the execution of the model.

Wednesday, May 10, 2017

Improving Performance in Resource-Constrained Systems

I worked on a simulation study last year with Dr. Holt (University of Washington) and Dr. Srinivasan (University of Tennessee). The results of the study surprised me. It made me start thinking differently about variation and its effect on systems. It might change the way you look at bottlenecks in a resource-constrained system as well.

We modeled a multi-project system, but the results we found can be applied to any system. This multi-project system (think of the projects as engineering projects) required various resources at different stages. We modeled a variety of project structures. The projects we modeled used several common resources. The primary output we studied was the project flow time, or the time it takes a project to complete the system.

It is very difficult to correctly determine the appropriate workload that should be placed upon resources in this environment. There is no question that putting too little work into the system will tend to starve key resources. And while there is pressure to keep resources busy, overloading them usually results in unfavorable outcomes like projects taking too long.

In an ideal setting, work schedules can be developed in advance, so that resources have just the right amount of work allocated to them at various points in time. However, in the project world, demand is highly uncertain, workflow is quite unpredictable, and task durations have significant variability. Even the best-planned schedules become difficult to execute in this environment. And when many different resources are used multiple times in a single project and frequently shared between projects, any unexpected delay in a single task can cause significant ripple effects delaying one or more projects. Even a small delay in a task far away from a key resource can cause chaos in the complicated and interrelated schedules that exist in a project environment, and attempts to tightly schedule projects are soon abandoned.

Our study outlined several steps to dramatically improve the performance of these organizations and I want to talk about two of them here. 1. Determine how resources should be loaded 2. Identify the appropriate level of reserve resources

The first step is not a new concept. This is basically controlling the amount of work in the system, and there are several approaches to implement this. In manufacturing, you might refer to it as CONWIP or Constant Work in Process. In the project-management environment, the term is CONPIP or Constant Projects in Process. We applied a slightly different mechanism, but it had a similar effect as CONWIP or CONPIP. In our system, we monitored the backlog of work for the resources. This backlog would generally not be completely present in the immediate queue of work at that resource. We would only release new work into the system once the bottleneck resource was below a specified threshold.

The chart below shows the first set of results from the resource loading study. The X-Axis shows the resource workload. For example, at 100 percent, the bottleneck resource has enough work in the system to keep it busy for 36 days. At 200 percent, the bottleneck resource has enough work in the system to keep it busy for 72 days. The blue line shows the effect that an increased workload has on the average flow time. As more work is pushed into the system, the average flow time increases. Increased flow time means it takes longer for projects to complete, so customers are much less happy! Longer flow times can be detrimental to a company. The black line is plotting the throughput. With an increased workload in the system, more projects can be completed, but at a certain point, the increase is negligible.

The red line is something we called the project value index. This is defined as the number of projects completed over a given period divided by the 90-percent probable flow time. The project value index is a value we want to maximize (more projects completed while decreasing flow time). We tend to want to be just a bit to the right of the high point on the project value index. This is a good balance of throughput.

The next issue we studied was the use of additional resources. The results of this study are what really surprised me. The typical thought process for improving a system is to add resources to the bottleneck. Then to keep improving the system, you would find the next bottleneck to add resources to. This feels like the natural progression for improving projects, right? In the system studied, we did have a clear bottleneck. We had nine resources. When the bottleneck resource was at 100 percent utilization, the other resources ranged from 50 percent to 75 percent.

Another strategy we tried was to use an Expert resource. This is a resource that can be used to help any other resource, not just the bottleneck. This would be the most experienced staff member that can do everything. We didn’t want this expert resource working just at the bottleneck resource. The task durations were all random. We let the expert resource help any resource when the task was taking longer than the expected value. These expert resources would ONLY be requested for help after the task duration had exceeded the expected mean value. For example, let’s say the expected task duration was six days. If the task was not complete by day six, then the expert resource would be requested to help complete the task to “shorten the long tail” of the task duration. The expert resource is specifically used to reduce the long tail on the right side of the service time distribution.

In the chart below, we used the Project Value Index to compare the two strategies of a) adding an Expert resource, which helps reduce the long task times and b) adding a resource at the bottleneck. As you can see, using the Expert resource had a significantly better impact! Wow. I did not expect this.

Here is what I learned from this study: When a task consumes a resource for an excessive amount of time, it not only delays this project from completing but it also delays every project in the queue for this resource. So long-tail tasks have an impact on potentially all the projects in the system, not just on the individual project. Focusing on these long-tail tasks, even on non-bottleneck processes, has a bigger impact on the system than just focusing on improving the bottleneck process.

That is something you should noodle on. This concept can of course be applied not only to project-management systems but also to many other resource-constrained systems.

Friday, March 9, 2012

A Day at "The Park" with ExtendSim

“Why are you collecting this data?”, I asked. It seemed a reasonable question given my current location. You see my friends and I were hiking to the top of Half Dome in Yosemite National Park, and we were currently about 7 miles from the trailhead. Imagine our surprise when a formally dressed college student (complete with official ID badge) asked us if we were willing to participate in a trail study. We were supposed to be in the middle of nowhere! But there she was sitting off to the side of the trail at a foldout card table that was holding a stack of time cards, a clipboard and some stickers.

“We’re trying to understand the traffic patterns along this route to Half Dome.”, she answered. "Participants show their time cards to officials stationed at various waypoints along the way. The officials will mark your time of arrival at each waypoint. Would you like to participate?”

“Perhaps.”, I said. “This sounds like you’re collecting data for a simulation study. Is that true?”

“Yes.”

“What simulation tool are you using?”, I asked.

“Umm….I think it’s called ExtendSim. Yeah. That’s it. It’s called ExtendSim.”

I smiled and said, “Well, in that case we’d be very happy to participate in your study.” ;-)

So the next time someone asks you, “Where is ExtendSim used?”, maybe the most appropriate response is, “Where is it not used?”