Run the interactive online version of this notebook (takes 1-2 minutes to load): Binder badge

9. How to handle WelDX files#

In this notebook we will demonstrate how to create, read, and update ASDF files created by WelDX. All the needed funcationality is contained in a single class named WeldxFile. We are going to show different modes of operation, like working with physical files on your harddrive, and in-memory files, both read-only and read-write mode.

9.1. Imports#

The WeldxFile class is being imported from the top-level of the weldx package.

[1]:
from datetime import datetime

import numpy as np

from weldx import WeldxFile

9.2. Basic operations#

Now we create our first file, by invoking the WeldxFile constructor without any additional arguments. By doing so, we create an in-memory file. This means, that your changes will be temporary until you write it to an actual file on your harddrive. The file_handle attribute will point to the actual underlying file. In this case it is the in-memory file or buffer as shown below.

[2]:
file = WeldxFile()
file.file_handle
[2]:
<_io.BytesIO at 0x7f3435bd0900>

Next we assign some dictionary like data to the file, by storing it some attribute name enclosed by square brackets. Then we look at the representation of the file header or contents. This will depend on the execution environment. In JupyterLab you will see an interactive tree like structure, which can be expanded and searched. The root of the tree is denoted as “root” followed by children created by the ASDF library “asdf_library” and “history”. We attached the additional child “some_data” with our assignment.

[3]:
data = {"data_sets": {"first": np.random.random(100), "time": datetime.now()}}
[4]:
file["some_data"] = data
file
<IPython.core.display.JSON object>

Note, that here we are using some very common types, namely an NumPy array and a timestamp. For weldx specialized types like the coordinates system manager, (welding) measurements etc., the weldx package provides ASDF extensions to handle those types automatically during loading and saving ASDF data. You do not need to worry about them. If you try to save types, which cannot be handled by ASDF, you will trigger an error.

We could also have created the same structure in one step:

[5]:
file = WeldxFile(tree=data, mode="rw")
file
<IPython.core.display.JSON object>

You might have noticed, that we got a warning about the in-memory operation during showing the file in Jupyter. Now we have passed the additional argument mode=“rw”, which indicates, that we want to perform write operations just in memory, or alternatively to the passed physical file. So this warning went away.

We can use all dictionary operations on the data we like, e.g. update, assign, and delete items.

[6]:
file["data_sets"]["second"] = {"data": np.random.random(100), "time": datetime.now()}

# delete the first data set again:
del file["data_sets"]["first"]
file
<IPython.core.display.JSON object>

We can also iterate over all keys as usual. You can also have a look at the documentation of the builtin type dict for a complete overview of its features.

[7]:
for key, value in file.items():
    print(key, value)
data_sets {'time': datetime.datetime(2022, 4, 29, 10, 56, 18, 229145), 'second': {'data': array([0.08290113, 0.67176871, 0.36163109, 0.95558127, 0.34854549,
       0.67029218, 0.23244161, 0.54969868, 0.42933861, 0.75289415,
       0.27876322, 0.67056226, 0.29786343, 0.74748835, 0.16341803,
       0.78636928, 0.77439332, 0.07472758, 0.06101694, 0.86830158,
       0.16998563, 0.74648879, 0.34422219, 0.30685028, 0.95081471,
       0.93930325, 0.38917039, 0.24356663, 0.72746839, 0.61273222,
       0.85159376, 0.77812614, 0.53457237, 0.05330211, 0.99281437,
       0.39833399, 0.78694306, 0.66274433, 0.85329567, 0.78223288,
       0.40638501, 0.72378483, 0.63232857, 0.42412561, 0.76286365,
       0.21961847, 0.6816895 , 0.69245366, 0.85371115, 0.92213178,
       0.73879154, 0.86807386, 0.85927318, 0.57415821, 0.04366165,
       0.14436299, 0.7895759 , 0.06072086, 0.70606308, 0.50005282,
       0.09171312, 0.36873069, 0.56847015, 0.12995714, 0.32244831,
       0.12105478, 0.47563365, 0.24134029, 0.57551414, 0.21104279,
       0.86743346, 0.36775765, 0.74042131, 0.58652657, 0.18182958,
       0.86547013, 0.46170681, 0.85604338, 0.9570348 , 0.47935317,
       0.44403641, 0.80220118, 0.51754679, 0.63987364, 0.77511394,
       0.8670231 , 0.76244222, 0.21441361, 0.57243396, 0.06338992,
       0.83250239, 0.88438564, 0.43999783, 0.7300321 , 0.80616475,
       0.1468618 , 0.92663828, 0.67599511, 0.52990767, 0.66737592]), 'time': datetime.datetime(2022, 4, 29, 10, 56, 18, 401582)}}

9.2.1. Access to data by attributes#

The access by key names can be tedious, when deeply nested dictionaries are involved. We provide a handling via attributes like this

[8]:
accessible_by_attribute = file.as_attr()
accessible_by_attribute.data_sets.second
[8]:
{'data': array([0.08290113, 0.67176871, 0.36163109, 0.95558127, 0.34854549,
        0.67029218, 0.23244161, 0.54969868, 0.42933861, 0.75289415,
        0.27876322, 0.67056226, 0.29786343, 0.74748835, 0.16341803,
        0.78636928, 0.77439332, 0.07472758, 0.06101694, 0.86830158,
        0.16998563, 0.74648879, 0.34422219, 0.30685028, 0.95081471,
        0.93930325, 0.38917039, 0.24356663, 0.72746839, 0.61273222,
        0.85159376, 0.77812614, 0.53457237, 0.05330211, 0.99281437,
        0.39833399, 0.78694306, 0.66274433, 0.85329567, 0.78223288,
        0.40638501, 0.72378483, 0.63232857, 0.42412561, 0.76286365,
        0.21961847, 0.6816895 , 0.69245366, 0.85371115, 0.92213178,
        0.73879154, 0.86807386, 0.85927318, 0.57415821, 0.04366165,
        0.14436299, 0.7895759 , 0.06072086, 0.70606308, 0.50005282,
        0.09171312, 0.36873069, 0.56847015, 0.12995714, 0.32244831,
        0.12105478, 0.47563365, 0.24134029, 0.57551414, 0.21104279,
        0.86743346, 0.36775765, 0.74042131, 0.58652657, 0.18182958,
        0.86547013, 0.46170681, 0.85604338, 0.9570348 , 0.47935317,
        0.44403641, 0.80220118, 0.51754679, 0.63987364, 0.77511394,
        0.8670231 , 0.76244222, 0.21441361, 0.57243396, 0.06338992,
        0.83250239, 0.88438564, 0.43999783, 0.7300321 , 0.80616475,
        0.1468618 , 0.92663828, 0.67599511, 0.52990767, 0.66737592]),
 'time': datetime.datetime(2022, 4, 29, 10, 56, 18, 401582)}

9.3. Writing files to disk#

In order to make your changes persistent, we are going to save the memory-backed file to disk by invoking WeldxFile.write_to.

[9]:
file.write_to("example.asdf")
[9]:
'example.asdf'

This newly created file can be opened up again, in read-write mode like by passing the appropriate arguments.

[10]:
example = WeldxFile("example.asdf", mode="rw")
example["updated"] = True
example.close()

Note, that we closed the file here explicitly. Before closing, we wanted to write a simple item to tree. But lets see what happens, if we open the file once again.

[11]:
example = WeldxFile("example.asdf", mode="rw")
display(example)
example.close()
<IPython.core.display.JSON object>

As you see the updated state has been written, because we closed the file properly. If we omit closing the file, our changes would be lost when the object runs out of scope or Python terminates.

9.4. Handling updates within a context manager#

To ensure you will not forget to update your file after making changes, we are able to enclose our file-changing operations within a context manager. This ensures that all operations done in this context (the with block) are being written to the file, once the context is left. Note that the underlying file is also closed after the context ends. This is useful, when you have to update lots of files, as there is a limited amount of file handles an operating system can deal with.

[12]:
with WeldxFile("example.asdf", mode="rw") as example:
    example["updated"] = True
    fh = example.file_handle
    # now the context ends, and the file is being saved to disk again.

# lets check the file handle has been closed, after the context ended.
assert fh.closed

Let us inspect the file once again, to see whether our updated item has been correctly written.

[13]:
WeldxFile("example.asdf")
<IPython.core.display.JSON object>

In case an error got triggered (e.g. an exception has been raised) inside the context, the underlying file is still updated. You could prevent this behavior, by passing sync=False during file construction.

[14]:
try:
    with WeldxFile("example.asdf", mode="rw") as file:
        file["updated"] = False
        raise Exception("oh no")
except Exception as e:
    print("expected error:", e)
expected error: oh no
[15]:
WeldxFile("example.asdf")
<IPython.core.display.JSON object>

9.5. Keeping a log of changes when manipulating a file#

It can become quite handy to know what has been done to file in the past. Weldx files provide a history log, in which arbitrary strings can be stored with time stamps and used software. We quickly run you through the process of adding history entries to your file.

[16]:
filename_hist = "example_history.asdf"
with WeldxFile(filename_hist, mode="rw") as file:
    file["some"] = "changes"
    file.add_history_entry("added some changes")
[17]:
WeldxFile(filename_hist).history
[17]:
[{'description': 'added some changes',
  'software': {'author': 'BAM',
   'homepage': 'https://www.bam.de/Content/EN/Projects/WelDX/weldx.html',
   'name': 'weldx',
   'version': '0.6.1.dev0+gb7f599a.d20220429'},
  'time': datetime.datetime(2022, 4, 29, 10, 56, 18)}]

When you want to describe a custom software, which is lets say a library or tool used to generate/modify the data in the file and we passed it into the creation of our WeldxFile.

[18]:
software = dict(
    name="my_tool", version="1.0", homepage="https://my_tool.org", author="the crowd"
)
with WeldxFile(filename_hist, mode="rw", software_history_entry=software) as file:
    file["some"] = "changes"
    file.add_history_entry("added more changes")

Let’s now inspect how we wrote history.

[19]:
WeldxFile(filename_hist).history[-1]
[19]:
{'description': 'added more changes',
 'software': {'author': 'the crowd',
  'homepage': 'https://my_tool.org',
  'name': 'my_tool',
  'version': '1.0'},
 'time': datetime.datetime(2022, 4, 29, 10, 56, 18)}

The entries key is a list of all log entries, where new entries are appended to. We have proper time stamps indicating when the change happened, the actual log entry, and optionally a custom software used to make the change.

9.6. Handling of custom schemas#

An important aspect of WelDX or ASDF files is, that you can validate them to comply with a defined schema. A schema defines required and optional attributes a tree structure has to provide to pass the schema validation. Further the types of these attributes can be defined, e.g. the data attribute should be a NumPy array, or a timestamp should be of type pandas.Timestamp. There are several schemas provided by WelDX, which can be used by passing them to the custom_schema argument. It is expected to be a path-like type, so a string (str) or pathlib.Path is accepted. The provided utility function get_schema_path returns the path to named schema. So its output can directly be used in WeldxFile(schema=…)

[20]:
from weldx.asdf.util import get_schema_path
[21]:
schema = get_schema_path("single_pass_weld-0.1.0")
schema
[21]:
PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/weldx/conda/v0.6.0_a/lib/python3.9/site-packages/weldx/schemas/weldx.bam.de/weldx/datamodels/single_pass_weld-0.1.0.yaml')

This schema defines a complete experimental setup with measurement data, e.g requires the following attributes to be defined in our tree:

  • workpiece

  • TCP

  • welding_current

  • welding_voltage

  • measurements

  • equipment

We use a testing function to provide this data now, and validate it against the schema by passing the custom_schema during WeldxFile creation. Here we just have a look at the process parameters sub-dictionary.

[22]:
from weldx.asdf.cli.welding_schema import single_pass_weld_example

_, single_pass_weld_data = single_pass_weld_example(out_file=None)
display(single_pass_weld_data["process"])
{'welding_process': GmawProcess(base_process='pulse', manufacturer='CLOOS', power_source='Quinto', parameters={'wire_feedrate': <TimeSeries>
 Constant value:
        10.0
 Units:
        m / min
 , 'pulse_voltage': <TimeSeries>
 Constant value:
        40.0
 Units:
        V
 , 'pulse_duration': <TimeSeries>
 Constant value:
        5.0
 Units:
        ms
 , 'pulse_frequency': <TimeSeries>
 Constant value:
        100.0
 Units:
        Hz
 , 'base_current': <TimeSeries>
 Constant value:
        60.0
 Units:
        A
 }, tag='CLOOS/pulse', meta={'modulation': 'UI'}),
 'shielding_gas': ShieldingGasForProcedure(use_torch_shielding_gas=True, torch_shielding_gas=ShieldingGasType(gas_component=[GasComponent(gas_chemical_name='argon', gas_percentage=<Quantity(82, 'percent')>), GasComponent(gas_chemical_name='carbon dioxide', gas_percentage=<Quantity(18, 'percent')>)], common_name='SG', designation=None), torch_shielding_gas_flowrate=<Quantity(20, 'liter / minute')>, use_backing_gas=None, backing_gas=None, backing_gas_flowrate=None, use_trailing_gas=None, trailing_shielding_gas=None, trailing_shielding_gas_flowrate=None),
 'weld_speed': <TimeSeries>
 Constant value:
        10
 Units:
        mm / s,
 'welding_wire': {'diameter': array(1.2) <Unit('millimeter')>}}

That is a lot of data, containing complex data structures and objects describing the whole experiment including measurement data. We can now create new WeldxFile and validate the data against the schema.

[23]:
WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode="rw")
<IPython.core.display.JSON object>

But what would happen, if we forget an import attribute? Lets have a closer look…

[24]:
# simulate we forgot something important, so we delete the workpiece:
del single_pass_weld_data["workpiece"]

# now create the file again, and see what happens:
try:
    WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode="rw")
except Exception as e:
    display(e)
<ValidationError: "'workpiece' is a required property">

We receive a ValidationError from the ASDF library, which tells us exactly what the missing information is. The same will happen, if we accidentally pass the wrong type.

[25]:
# simulate a wrong type by changing it to a NumPy array.
single_pass_weld_data["welding_current"] = np.zeros(10)

# now create the file again, and see what happens:
try:
    WeldxFile(tree=single_pass_weld_data, custom_schema=schema, mode="rw")
except Exception as e:
    display(e)
<ValidationError: "mismatched tags, wanted 'asdf://weldx.bam.de/weldx/tags/core/time_series-0.1.*', got 'tag:stsci.edu:asdf/core/ndarray-1.0.0'">

Here we see, that a signal tag is expected, but a asdf/core/ndarray-1.0.0 was received. The ASDF library assigns tags to certain types to handle their storage in the file format. As shown, the signal tag is contained in weldx/measurement container, provided by weldx.bam.de. The tags and schemas also provide a version number, so future updates in the software become manageable.

Custom schemas can be used to define own protocols or standards describing your data.

9.7. Summary#

In this tutorial we have encountered how to easily open, inspect, manipulate, and update ASDF files created by WelDX. We’ve learned that these files can store a variety of different data types and structures.

Discussed features:

  • Opening in read/write mode WeldxFile(mode="rw").

  • Creating files in memory (passing no file name to WeldxFile() constructor).

  • Writing to disk (WeldxFile.write_to).

  • Keeping log of changes (WeldxFile.history, WeldxFile.add_history_entry).

  • Validation against a schema WeldxFile(custom_schema="/path/my_schema.yaml")


Generated by nbsphinx from a Jupyter notebook.