Introduction to SAP ABAP internal tables

In this SAP Press book chapter download, find an overview of SAP ABAP internal tables. Learn about the different ABAP table types available.

7 Processing of Internal Tables

    Requires Free Membership to View

Internal tables are among the most complex data objects available in the ABAP
environment. The use of internal tables lets you store dynamic datasets in the main
memory. Internal tables are comparable to arrays and they spare the programmer
the effort of program-controlled memory management thanks to their dynamic
nature. The data in internal tables is managed per row, whereas each row has the
same structure.

In most cases, internal tables are used for the buffering or formatting of contents
from database tables. The type of access to internal tables plays an important role
for performance, as is the case with database tables. Experience shows that the
tuning of internal tables enables similarly major effects as the tuning of database
accesses. The negative effects of inefficient accesses to internal tables for the overall
system can be compensated more easily than inefficient database accesses by
adding further CPUs or application servers. Inefficient database accesses affect
the database as a central resource, whereas inefficient accesses to internal tables
impact the better scalable application layer (see Chapter 2).

The following sections first provide a general overview of the internal tables. This
is followed by a description of how the internal tables are organized in the main
memory. The subsequent section discusses the different types of internal tables.
The major part of this chapter then details the performance aspects for the processing
of internal tables. Typical problematic examples and solution options are
presented here.

7.1 Overview of Internal Tables

Internal tables are completely specified by four properties:

1. Table type
The access type to the table type determines how ABAP accesses the individual
table rows. Section 7.3, Table Types, discusses this topic in great detail.
2. Row type
The row type of an internal table can be any ABAP data type.
3. Uniqueness of the key
The key can be specified as unique or non-unique. In case of unique keys, there
are no multiple entries (regarding the key) in the internal tables. The uniqueness
is based on the table type. Standard tables only allow for non-unique keys
and hashed tables only for unique keys.
4. Key components (taking the sequence into account)
The key components and their sequence specify the criteria based on which the
table rows are identified.

Figure 7.1 illustrates this syntactically.

Figure 7.1 Internal Tables — Declaration

The combination of access type and table type is mainly relevant for the performance.
Section 7.3, Table Types, discusses the various access types and table types.

Before describing the table types in detail, let’s first discuss the organization of
internal tables in the main memory.

7.2 Organization in the Main Memory

In the main memory, the internal tables, just like the database tables, are organized
in blocks or pages. In the context of internal tables, the following sections use the
term pages.

When an internal table is declared in an ABAP program, the system only creates
a reference (table reference) in the main memory initially. Only when entries are
written to the table does the system create a table header and a table body. Figure
7.2 shows a schematic diagram of the organization in the main memory.

Figure 7.2 Schematic Diagram of the Organization of the Internal Tables in the Main Memory

The table header has a reference to the first page of the table body and another
reference to page management. Page management manages the addresses of the
pages in the main memory.

The table reference currently occupies 8 bytes of memory space. The table header
occupies about 100 bytes of memory space depending on the platform. The space
required for page management depends on the number of pages.

The table body consists of pages that can include the table rows. The first two
pages are — depending on the row length and other factors — usually smaller than
the pages 3 to n (if the row lengths are not so long that the maximum page size is
reached already at the beginning).

As of the third page, the pages are created with the maximum page size, which is
usually between 8 KB and 16 KB. This depends on the length of the row. Unlike
database tables, the access is not per page but per row. So if you access a row of an
internal table, the system reads only one row. The effort for searching table entries
(or data records) is comparable to the database tables. For this purpose, the index
or hash administration provides support for the internal tables. You learn more
about internal tables in Section 7.3, Table Types, for the table types because they
are directly related to this topic.

The table header includes the most important information about an internal table.
For example, you can quickly query the number of rows using DESCRIBE TABLE
<itab> LINES <lines> or the integrated function, LINES( itab ), from the table
header.

As very small internal tables with only a few rows can result in wastage due to the
memory use of the automatically calculated first page, INITIAL SIZE is added for
the declaration of internal tables. It can provide information on the size of the first
page, so a smaller memory allocation than in the standard case occurs.

However, if considerably more rows are required than originally specified for INITIAL
SIZE, the third page is created faster with the maximum page size. For example,
if 4 was specified for INITIAL SIZE, the third page may already be required as
of the 13th row if the second page is twice as large as the first page. Relatively few
rows (13, for example) require relatively much memory (three pages, third page
with a size of 8 to 16 KB), whereas one page would have been sufficient if a higher
value (for example, 14) had been specified for INITIAL SIZE. Consequently, for
small tables it is important that INITIAL SIZE is not selected too small. Select a value
that provides sufficient space in the first (or first and second) page for most cases.

INITIAL SIZE should always be specified if you require only a few rows and the
internal table exists frequently. For nested tables, if an internal table is part of a
row of another internal table, this is likely for the inner internal table. It can also
occur for attributes of a class if there are many instances of this class.

Caution: INITIAL SIZE and APPEND SORTED BY

In conjunction with the APPEND wa SORTED BY comp command, the INITIAL SIZE addition
not only has a syntactic but also a semantic meaning (see documentation). However,
don’t use the APPEND wa SORTED BY comp command; instead, work with the SORT
command.

Depending on the table type or type of processing, you also require a management
for the access to the row, that is, an index for the index tables and a hash
administration for the hashed tables. At this point, memory may be required for
the management of entries in addition to the pages. This management also occupies
memory. Both in the Debugger and in the Memory Inspector, this memory is
added to the table body and not displayed separately. Compared to the user data,
this management can generally be neglected.

But how can you release allocated space in the internal tables again? The deletion
of individual or multiple rows from the internal table using the DELETE itab
command doesn’t result in any memory release. The rows concerned are only
“selected” as deleted and not deleted from the pages.

Only when you use the REFRESH or CLEAR statements the system does release
the pages of the internal tables again. Only the header and a small memory area
remain.

Note

In this context, released means that the occupied memory can be reused. As the memory
allocation from the Extended Memory (EM) for a user is usually done in blocks (see Section
6.1 in Chapter 6), which are considerably larger than the pages of an internal table,
this is referred to as a two-level release. Release initially means that the pages within an
EM block are released and this space can then be reused by the same user. Only if the
EM block is completely empty and doesn’t contain any data (variables, and so on) of the
user any longer is this block returned to the SAP memory management and available for
the other users again.

The FREE itab ABAP statement, however, results in the complete de-allocation of
the table body, that is, all pages and the index (if available) of the internal tables
are released. Additionally, the table header is added to a system-internal “free list”
for reuse.

If an internal table should be reused, it is advisable to use REFRESH or CLEAR instead
of FREE because this way the creation of the first page can be omitted. If a large part of the rows of an internal table was deleted using DELETE and the occupied
memory should be released, it is recommended to copy the table rows. A simple
copy to another internal table is not sufficient because of table sharing, which is
discussed in Section 7.4, Performance Aspects. Alternatively, you can revert to
ABAP statements (INSERT or APPEND) or to the EXPORT/IMPORT variants (see Section
6.2.2 in Chapter 6) for copying. In the context of performance, the “release” of
memory only plays a secondary role (as long as no memory bottleneck exists in
the system). In contrast to fragmented database tables, fragmented internal tables
have no negative effects on the performance because the entries can always be
addressed efficiently because internal tables are always managed per row.

Background: Difference between Internal Tables and Database Tables

Internal tables can be compared to database tables in many respects, but there is one
major difference:

Internal tables are always processed on a row basis, whereas database tables are always
processed on a set basis. A set-based processing, possible with Open SQL on database
tables, is not possible on internal tables because the single row is the main processing
criterion for internal tables, whereas a set of data records is the main processing criterion
for database tables. Set-based accesses to internal tables, for instance, LOOP ... WHERE
or DELETE ... WHERE, are emulated by the ABAP VM and can be mapped in an optimized
way for some table types (see Section 7.4, Performance Aspects). More complex,
set-based operators, such as joins and aggregates,… are not possible on internal tables.
They must be programmed using the existing ABAP language techniques.

After you’ve learned about the organization of internal tables in the main memory,
the next section focuses on the organization of internal tables and discusses the
different types of internal tables.

7.3 Table Types

Internal tables can be subdivided into index tables and hashed tables. The index
tables, in turn, can be divided into standard tables and sorted tables. Figure 7.3
shows an overview of the table types.

Figure 7.3 Overview of the Table Types

The table type specifies how you can access individual table rows via ABAP.

For standard tables, the access can be implemented via the table index or a “key.”
For a key access, the response time depends linearly on the number of table entries
because the read access corresponds to a linear scan of the entries, which is canceled
after the first hit. The key of a standard table is always non-unique. If no key
is specified, the standard table receives the default key, which is a combination of
all character-like fields.

Sorted tables are always sorted by the key. The access can be carried out via the
table index or the key. For a key access, the response time depends logarithmically
on the number of table entries because the read access is carried out via a binary
search. The key of sorted tables can be unique or non-unique. On sorted tables,
you can process partial keys (initial parts of the complete key) in an optimized
manner. An over-specification of the table key is also possible; only the components
of the key are used for the search and the remaining components are then
utilized for the filtering.

Standard tables and sorted tables are also referred to as index tables because both
tables can be accessed using the table index.

The read access to hashed tables is only possible by specifying a key. Here, the
response time is constant and doesn’t depend on the number of table entries
because the access is carried out via a hash algorithm. The key of hashed tables
must be unique. Neither explicit nor implicit index operations are permitted on
hashed tables. If a hashed table is accessed with a “key” that is different to the
unique table key, the table is handled like a standard table and searched linearly
according to the entries. This is also the case for a partial key. Different to the
sorted table, this partial key cannot be optimized for the hashed table. Over-specified
keys are processed in an optimized manner.

By means of the DESCRIBE TABLE <itab> KIND <k> statement, you can determine
the current table type at runtime. Of course, this is also possible using Run Time
Type Identification (RTTI).

An index or a hash administration is available for the efficient management or
access optimization of internal tables. The following section describes which types
are available and when they are created.

Index Tables

Indexes for index tables are only created when the physical sequence no longer
corresponds to the logical sequence, that is, when one of the INSERT, DELETE, or
SORT statements is executed on the table and the following conditions apply:

1. INSERT
The entry to be inserted should be inserted before an already existing entry.
(An INSERT statement that inserts behind the last record largely corresponds to
an APPEND statement.)
2. DELETE
The entry to be deleted is not the last entry of the table.
3. SORT
The table has a certain size and is sorted.

An index is used for the efficient index access in the “logical sort sequence” or the
efficient finding of “valid rows” if the table pages have gaps due to deletions. By
means of the index, the logical sequence of the table is mapped on the physical
memory addresses of the entries.

An index is available in two types:
1. As a linear index
2. As a tree-like index

The index structure is always maintained without any gaps, whereas the table
pages may have gaps due to the deletion of records. In comparison to the management
of the index without gaps, the management of the table pages without gaps
would be too time consuming for larger tables.

Due to the management of the index structure without gaps, the insertion and
deletion of records incur movement costs because the existing entries must be
moved. Strictly speaking, these costs are overheads for copying. For large indexes
(as of about 5,000 entries), they get dominant; this is why a tree-like index is created
for large tables.

In addition to the index, a free list exists that manages the addresses of the entries
that were deleted using DELETE for reuse.

Figure 7.4 shows a schematic diagram of a linear index.

Figure 7.4 Schematic Diagram of a Linear Index

Whether a tree-like index is created depends on system-internal rules, for example,
the number of entries (to be expected), and other factors. Figure 7.5 shows a
schematic diagram of a tree-like index.

Figure 7.5 Schematic Diagram of a Tree-Like Index

For the tree-like index, the index entries are organized in leaves. The previously
mentioned movement or copy costs only incur at the leaf level. The index doesn’t
have to be allocated at once; you only require continuous memory at leaf level.
In return, you must first navigate through the tree structure when you access the
index to reach the respective index entry.

Apart from that, the tree-like indexes on index tables are comparable to the database
indexes presented in Chapter 5. A tree-like index requires about 50% more
space than a linear index.

If the logical sequence of entries corresponds to the physical sequence in the main
memory when the index tables are processed, you don’t need to create an index.
In this case, the insertion sequence corresponds to the physical sequence, and the
table was filled with a sorting and not deleted or sorted. If no index is necessary,
the internal table requires less memory.

Hash Administration

The hash administration is based on the unique key of the table. The hash administration
is created for hashed tables only. It is established using the unique key of
the internal table. Index accesses (for example, second entry of the internal table)
are not possible, hashed tables can only be accessed with the key.

For the hashed table, each key value is assigned to a unique number using a hash
function. For this number, the memory address of the respective data record is
stored in a corresponding hash array.

If a DELETE or SORT is executed on a hashed table, you must create a doublelinked
list (previous and next pointer), so sequential accesses (LOOP) via the data
are still possible according to the insertion sequence (or in a sort sequence generated
using SORT). The double-linked list requires about 50% more space for the
hash administration.

Figure 7.6 shows a schematic diagram of a hash administration.

Figure 7.6 Schematic Diagram of a Hash administration

Limitations

Besides the memory that is available to the user, there are further limitations for
internal tables:

A limit for the number of rows in internal tables results because they are addressed
internally and in ABAP statements via 4 byte integers, which limits them to
2,147,483,647 entries.

The size of hashed tables is further limited by the biggest memory block available
at once. The maximum size is 2 GB, but it is usually further limited by the
ztta/max_memreq_MB profile parameter. The maximum number of rows of hashed
tables depends on the required size of the hash administration that must be stored
there.

The actual maximum size of internal tables is usually smaller than specified by the
previous limits because the overall available memory is usually not only used by a
string or an internal table (see ABAP documentation: Maximum size of dynamic
data objects).

Summary of the Table Types

Table 7.1 lists the most important characteristics of the table types. This is followed
by a recommendation for when you should use which table type.

  Standard Table Sorted Table Hashed Table
Possible Accesses Index access or key access Index access or
key access
Key access
Uniqueness Non-unique Non-unique or
unique
Unique
Optimal Access Index or binary search (if the
table is sorted by the search
components)
Index or key Key


Table 7.1 Characteristics of Table Types

Standard tables should only be used if all entries should be processed sequentially
after filling or if the internal tables should be accessed flexibly and efficiently using
multiple different keys. For this purpose, the table must be sorted by the search
field and scanned using the binary search. The resorting is carried out only as often
as necessary. If a resorting is only required for one or a few read accesses, the sort
times far outweigh the time savings for reading. Use key accesses without binary
search only for small tables or better avoid them completely. If you search only via
a specific field, use a sorted or hashed table.

Sorted tables are particularly suited for partially sequential processing, for example,
when a small part of a table should be processed via key accesses for which
only the initial part of the key is given. Key accesses to the table key can also be
carried out efficiently by the sorted tables.

Hash tables are optimal if you access only via the table key. If the key has a high
left significance
, you can also use a unique sorted table because in this case performance
benefits arise for the binary search when you access individual rows. In this
context, left significance means that the selective part of a key should be positioned
at the beginning of the key (as far to the left as possible).

 

This was first published in February 2011

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.