-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathCIFtbx_Manual.html
6698 lines (6421 loc) · 304 KB
/
CIFtbx_Manual.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<font face="Arial,Helvetica,Times" size="3">
<center>
<font size="+2"><img src="CIFtbx_Manual_files/CIFtbx_logo.jpg" alt="CIFtbx"></font><br />
<font size="+3">
<b>Manual</b><br />
</font>
by<br />
Herbert J. Bernstein<br />
and<br />
Sydney R. Hall<br />
Copyright © 1997, 1998, 2024<br />
All Rights Reserved
<p>
</center>
<h2 align="center">Preface</h2>
<p>
The Crystallographic Information File (CIF) format is one of the most commonly used electronic data handling
protocols in chemistry and crystallography for exchanging and archiving structural and diffraction information.
Because the CIF syntax requirements and data definitions are coordinated and supported by the International
Union of Crystallography as part of their publishing and archival activities, there is established support
for this format both in database entry and in retrieval tasks. This spawned the need to develop more comprehensive
software for generating, reading and manipulating CIF data for a wide
range of scientific domains. Two good starting points to find appropriate CIF resources are the IUCr web site
<a href="https://www.iucr.org/resources/cif">https://www.iucr.org/resources/cif</a> and searches in GitHub <a href="http://github.com">http://github.com</a>
for the phrases "Crystallographic Information File" and "Crystallographic Information Framework"
<p>
This book is an instruction and reference manual for programmers employing the
CIFtbx library of Fortran functions to develop CIF applications.
The most recent releases of the <i>CIFtbx</i> library are open source software. You
may redistribute this software and/or modify this software under the
terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License
<A href="https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html">https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html</a>, or (at your
option) any later version.
Alternatively you may redistribute and/or modify the <i>CIFtbx</i> API (but not
the programs) under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1 of the
License
<A href="https://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html">https://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html</a>,
or (at your option)
any later version.
This book itself is part of <i>CIFtbx</i> version 4.1.1 library release, and as such may be redistributed and/or modified under the terms
of the GPL. In addition, this book, as a document, alternatively may be distributed, remixed, adapted, and build upon in any medium
or format, even for commercial purposes under the terms of the CC BY-SA 4.0 license published by Creative Commons (CC)
<a href="https://creativecommons.org/licenses/by-sa/4.0/legalcode.en">https://creativecommons.org/licenses/by-sa/4.0/legalcode.en</a>
<p>
The <i>CIFtbx</i> library and this manual are intended for both novices and experts of CIF applications.
The toolbox has already been used in the development of CIF manipulation programs such as Cyclops
<a href="#Bernstein_Hall_1998">[Bernstein, Hall, 1998]</a>,
CIFIO <a href="#Hall_1993">[Hall, 1993],
CIF2CIF <a href="#Bernstein_1997">[Bernstein, 1997]</a>,
pdb2cif <a href="Bernstein_Bernstein_Bourne_1998">[Bernstein, Bernstein, Bourne, 1998]</a>
and cif2pdb <a href="#Bernstein_Bernstein_1996">[Bernstein, Bernstein, 1996]</a>.
Extracts from some of these applications are used herein to illustrate various programming approaches.
<p>
This edition of the manual is for use with <i>CIFtbx</i> version 4.1.1. Scientific papers on <i>CIFtbx</i>
<a href="#Hall_1993a">[Hall, 1993a]</a>
<a href="#Hall_Bernstein_1996">[Hall, Bernstein, 1996]</a>
provide background information on earlier versions of the tool box but lack the detail
of a reference manual. The <i>CIFtbx</i> tools described in this manual are appropriate for
many current CIF applications and dictionaries. This includes the access and application
of data definitions in dictionaries based on the definition language DDL1
<a href="#Hall_Cook_1995">[Hall, Cook, 1995]</a>,
and on the extended language DDL2
<a href="#Westbrook_Hall_1995">[Westbrook, Hall, 1995]</a>
such as the macromolecular dictionary
<a href="#Fitzgerald_et_al_1996">[Fitzgerald <i>et al.</i>, 1996], as well as some
aspects of the most recent versions of CIF, CIF 2.0
<a href="#Bernstein_Brown_Gražulis_2016">[Bernstein, Brown, Gražulis <i>et al.</i>, 2016]</a>
and of the most recent Dictionary Definition Language, DDLm <a href="#Spadaccini_Hall_2012">[Spadaccini, Hall, 2012]</a>
<p>
The first two chapters of the manual introduce the general concepts of the CIF syntax and
are intended for programmers who have no prior knowledge of this format. This is the
initial primer information. Later chapters give detailed explanations on the 21 functions
and 36 variables that make up the tools, and how they are applied to simple and complex tasks.
The appendices at the end of the manual explain how to implement the tool box software on
your computer, and provide background information on the DDL used to define CIF data items,
and to construct CIF dictionaries.
<p>
<h2 align="center">CONTENTS</h2>
<p>
<a href="#Preface">Preface</a>
<p>
<a href="#Recent_History_and_Acknowledgements">Recent History and Acknowledgements</a>
<p>
<a href="#Primer_Section">Primer Section</a>
<p>
<a href="#CHAPTER_1">1. What is a CIF?</a>
<ul>
<li><a href="#1.1">1.1. Introduction</a></li>
<li><a href="#1.2">1.2. Basic syntax</a></li>
<li><a href="#1.3">1.3. Case sensitivity</a></li>
<li><a href="#1.4">1.4. Special characters</a></li>
<li><a href="#1.5">1.5. Syntax control words</a></li>
<li><a href="#1.6">1.6. File examples</a></li>
<li><a href="#1.7">1.7. Data definitions</a></li>
<li><a href="#1.8">1.8. Handling DDL1 and DDL2 name structures</a></li>
</ul>
<p>
<a href="#CHAPTER_2">2. Overview of the Tool Box</a>
<ul>
<li><a href="#2.1">2.1. Introduction</a></li>
<li><a href="#2.2">2.2. Initialisation Commands</a></li>
<li><a href="#2.3">2.3. Read Commands</a></li>
<li><a href="#2.4">2.4. Write Commands</a></li>
<li><a href="#2.5">2.5. Variables</a></li>
<li><a href="#2.6">2.6. Name Aliases</a></li>
</ul>
<p>
<a href="#CHAPTER_3">3. How to Use the Tool Box</a>
<ul>
<li><a href="#3.1">3.1. Introduction</a></li>
<li><a href="#3.2">3.2. Reading CIF data</a></li>
<li><a href="#3.3">3.3. Reading text data in loops</a></li>
<li><a href="#3.4">3.4. Reading user-requested data items</a></li>
<li><a href="#3.5">3.5. Creating a CIF</a></li>
<li><a href="#3.6">3.6. General tips on applying <i>CIFtbx</i></a></li>
<li><a href="#3.6.1">3.6.1. Reading a CIF</a></li>
<li><a href="#3.6.2">3.6.2. Writing a CIF</a></li>
<li><a href="#3.6.3">3.6.3. Program organisation</a></li>
</ul>
<p>
<a href="#Reference_Section">Reference Section</a>
<p>
<a href="#CHAPTER_4">4. Initialisation Functions</a>
<p>
<ul>
<li><a href="#4.1">4.1. Introduction</a></li>
<li><a href="#4.2">4.2. init_</a></li>
<li><a href="#4.3">4.3. dict_</a></li>
</ul>
<p>
<a href="#CHAPTER_5">5. Read Functions</a>
<ul>
<li><a href="#5.1">5.1. Introduction</a></li>
<li><a href="#5.2">5.2. ocif_</a></li>
<li><a href="#5.3">5.3. data_</a></li>
<li><a href="#5.4">5.4. bkmrk_</a></li>
<li><a href="#5.5">5.5. find_</a></li>
<li><a href="#5.6">5.6. test_</a></li>
<li><a href="#5.7">5.7. name_</a></li>
<li><a href="#5.8">5.8. numb_</a></li>
<li><a href="#5.9">5.9. numd_</a></li>
<li><a href="#5.10">5.10. cmnt_</a></li>
<li><a href="#5.11">5.11. purge_</a></li>
</ul>
<p>
<a href="#CHAPTER_6">6. Write Functions</a>
<ul>
<li><a href="#6.1">6.1. Introduction</a></li>
<li><a href="#6.2">6.2. pfile_</a></li>
<li><a href="#6.3">6.3. pdata_</a></li>
<li><a href="#6.4">6.4. ploop_</a></li>
<li><a href="#6.5">6.5. pchar_</a></li>
<li><a href="#6.6">6.6. pcmnt_</a></li>
<li><a href="#6.7">6.7. pnumb_</a></li>
<li><a href="#6.8">6.8. pnumd_</a></li>
<li><a href="#6.9">6.9. ptext_</a></li>
<li><a href="#6.10">6.10. prefx_</a></li>
<li><a href="#6.11">6.11. close_</a></li>
</ul>
<p>
<a href="#CHAPTER_7">7. Variables</a>
<p>
<a href="#CHAPTER_8">8. Error Message Glossary</a>
<ul>
<li><a href="#Fatal_Errors">Fatal Errors</a>
<ul>
<li><a href="#Array_Bounds">Array Bounds</a></li>
<li><a href="#Data_Sequence">Data Sequence, Syntax and File Construction</a></li>
<li><a href="#Invalid_Arguments">Invalid Arguments</a></li>
</ul></li>
<li><a href="#Warnings">Warnings</a>
<ul>
<li><a href="#Output_Errors">Output Errors</a></li>
<li><a href="#Dictionary_Checks">Dictionary Checks</a></li>
</ul></li>
</ul>
<p>
<a href="#Appendices">Appendices</a>
<ul>
<li><a href="#APPENDIX_A">A. Usage Restrictions and Policy</a>
<ul>
<li><a href="#GPL">GPL</a></li>
<li><a href="#LGPL">LGPL</a></li>
<li><a href="#RASLIC">RASLIC</a></li>
<li><a href="#IUCR_POLICY">IUCr Policy</a></li>
</ul></li>
<li><a href="#APPENDIX_B">B. Installation of <i>CIFtbx</i></a>
<ul>
<li>Installation</li>
<li>Reporting Problems</li>
</ul></li>
<li><a href="#APPENDIX_C">C. CYCLOPS2</a>
<ul>
<li>CYCLOPS2 overview</li>
<li>Error Message Glossary</li>
</ul></li>
<li><a href="#APPENDIX_D">D. Syntax of a Star File</li>
<li><a href="#APPENDIX_E">E. Internals and Programming Style</li>
</ul></li>
<p>
<a href="References">References</a>
<p>
<a href="Index">Index</a>
<p>
<h2 id="Recent_History_and_Acknowledgements" align="center">Recent History and Acknowledgements</h2>
The CIF format was first adopted by the IUCr for journal submissions in 1990, following
the publication of the CIF core dictionary
<a href="#Hall_Allen_Brown_1991">[Hall, Allen, Brown, 1991]</a>.
Since then there has been continual growth in the use of CIFs and in the development of
software for CIF
generation and manipulation. In the early 1990's there had been appreciable changes to
the nature of CIF applications. These
have been brought about largely because of new data definitions in the macromolecular CIF
dictionary
<a href="#Bourne_et_al_1996">[Bourne <i>et al.</i>, 1996]</a>
and <a href="#Fitzgerald_et_al_1996">[Fitzgerald <i>et al.</i>, 1996]</a>
This, and the powder diffraction dictionary
<a href="#Toby_von_Dreele_Larson_2003">[Toby, von Dreele, Larson, 2003]</a>,
were adopted by the IUCr in June 1997 as standards for exchanging
crystallographic data in these fields. The adoption of the macromolecular dictionary, in
particular, signaled a watershed in the
way that this type of structural data will be handled in the future.
<p>
One recent impetus for newer and more versatile versions of <i>CIFtbx</i> was to
assist one of us (HJB) in using CIF data derived
from Protein Data Bank files
<a href="#Bernstein_et_al_1977">[Bernstein, <i>et al.</i>, 1977]</a>. In the
development of pdb2cif <a href="#Bernstein_Bernstein_Bourne_1998">[Bernstein, Bernstein, Bourne, 1998]</a>,
<i>CIFtbx</i>2 enabled hundreds of CIF
data names, embedded in existing software,
to be mapped into the DDL2 format, and for the existence of these items to be checked.
<i>CIFtbx</i>2 was used in a release of
the Xtal 3.5 System <a href="#Hall_King_Stewart_1995">[Hall, King, Stewart 1995]</a>,
and in the upgrade of CYCLOPS to CYCLOPS2
<a href="#Hall_Bernstein_1996">[Hall, Bernstein, 1996]</a>
(see <a href="#APPENDIX_C">Appendix C</a>). It
has provided the platform for the creation of cif2cif, a program which checks and
reformats CIFs. <i>CIFtbx</i>2 was used for rapid adaptation
of a command-line driven lattice identification program to CIF [Bernstein, Andrews 96]
<p>
A primary objective with this toolbox has been to preserve the functionality of all dictionaries written the core dictionary language
DDL1 while providing a seamless link to the richer DDL2 dictionaries. During this development we have leaned heavily on the cooperation
of our colleagues and collaborators. Many people have contributed to the CIF development and although we are certain to not mention
many workers who have given valuable help at some stage, we must highlight the special recent efforts of Helen Berman, Frances
Bernstein, Phil Bourne, Paula Fitzgerald, Brian McMahon and John Westbrook.
<p>
<h1 id="CHAPTER_1" align="center">
CHAPTER 1
</h1>
<h2 id="What_is_a_CIF" align="center">
What is a CIF?
</h2>
<h3 id="1.1" align="left">1.1. Introduction
</h3>
<p>
What is a CIF? To a crystallographer or a structural chemist, it is a simple and flexible way
of storing and exchanging
numerical or text data electronically. The letters C-I-F stand for Crystallographic Information
File [Hall, Allen, Brown 91].
A CIF is a text file that can be easily read by humans or computers because of its very simple format. The rules governing
this format are a subset of the general syntax of the Self-Defining Text Archive and Retrieval (STAR) File [Hall, 91].
<p>
The CIF format is extremely flexible. Data items may be placed anywhere in a file or a line, and in any order, provided
that each data value is preceded by an identifying label. Here is an extract from a CIF. The data values are in bold type
and the data identifiers (or names) are strings starting with an underscore.
<p>
<center><table border=1><tr><td><pre>
_crystal_habit <b>irregular_tetrahedron</b>
_crystal_colour <b>'blue green'</b>
_crystal_density <b>1.765(4)</b>
loop_
_crystal_face_index_h
_crystal_face_index_k
_crystal_face_index_l
_crystal_face_dist_from_centre # in millimetres
<b> 1 1 1 0.25
-1 -1 1 0.27
1 -1 -1 0.25
-1 1 -1 0.29</b>
_crystal_preparation
<b>; The compound is crystallised from ethanol by slow
evaporation.
<b>;</pre></td></tr></table></center>
<p>
<h3 id="1.2" align="left">1.2. Basic syntax
</h3>
<p>
The above example illustrates many of the basic principles of a CIF.
<ol>
<li>All contents are plain text. The definition of "plain" text is system
dependent. For modern unix compatible systems, that is most likely to be <i>ascii</i>
or Unicode utf8 text</li>
<li>Each <i>data value</i> (shown above in bold) must be preceded by an identifying
<i>data name</i>.</li>
<li>A <i>data name</i> (or tag) is a character string starting with an underscore character.</li>
<li><i>Data values</i> are of three basic types: number strings, character strings and text strings.
<ul>
<li>A <i>number string</i> may be in integer, decimal or scientific notation.
Numbers may have an error
estimate appended within parentheses (see <tt>_crystal_density</tt> above),
if this is allowed by the data definition (see DDL description below).</li>
<li>A character string is a sequence of characters that is not a number;
not preceded by an underscore,
and does not exceed a system-dependent number of
characters in length. For all versions of CIF it is possible for
a string to be as long as 78 characters, and for most modern versions
of CIF and most modern computer systems, a string may be as long
as 2046 characters. If the string contains blanks it must be
surrounded by quote characters (see <tt>_crystal_habit</tt>).</li>
<li>A text string may be one or more lines in length and must be bounded by
semicolons in column 1 preceding the first
character, and following the last character (see <tt>_crystal_preparation</tt>).
Unless the data definition imposes more restrictive
rules, a text string may be used any place where a character string might be expected.</li>
</ul>
</li>
<li>Lists of repeated data values are to be preceded by data names in matching order. Such lists must be preceded by a loop_ command.</li>
<li>A data name and its value (i.e. <i>tag/value pair</i>, or <i>tuple</i>) are referred to as
a <i>data item</i>. Data items are grouped into data
blocks. A data block is preceded by a <tt>data_</tt><name> command. The <name> string is
referred to as the data block name and this
must be unique within a CIF.</li>
<li>Within a data block, each data name must be unique.</li>
<li>A CIF is restricted to 80-character lines.</li>
<li>The hash character '#' is used to start a comment on a line.</li>
</ol>
<p>
<h3 id="1.3" align="left">
1.3. Case sensitivity.
</h3>
<p>
Data names are not sensitive to the case of letters. For example, the strings
<p>
<pre>
_ATOM_SITE_CARTN_X
_atom_site_cartn_x
_AtOm_SiTe_CaRtN_x</pre>
<p>
all represent the identical data name in a CIF. Strings that are not data names are case sensitive in that the case of letters must always be preserved.
<p>
<h3 id="1.4" align="left">
1.4. Special characters
</h3>
<p>
Certain characters in a CIF serve a special function when used in a particular way.
A brief summary of these is given below. For more detail see
<a href="Hall_Spadaccini_1994">[Hall, Spadaccini, 1994]</a>.
<p>
<table border=0>
<tr><td valign="top">_</td><td>the underscore (underline) is used to start a
data name, or to end of a command string, such as <tt>loop_</tt>.
They terminate <i>CIFtbx</i> function and variable names. They sometimes are used
to replace blanks in strings so as to avoid surrounding quotes.
</td></tr>
<tr><td valign="top"><i><w></i></td><td> "white-space" characters such as blanks, tabs
and end-of-lines are used to delimit fields
in a CIF, <i>i.e.</i> one or more white-space characters serve to separate data names and values,
provided the data names and values are not inside a quoted string (as with the _crystal_colour
value above) or a text string (as with the <tt>_crystal_preparation</tt> value above).
</td></tr>
<tr><td valign="top">#</td><td> the hash mark (sharp) disables syntactic processing of characters following on a line,
except within a quoted or text string. The hash is used for comments in a CIF.
</td></tr>
<tr><td valign="top">'</td><td> the single quote (apostrophe) may be used to protect a character string, but not a number
or text string, from internal syntactic processing. This is done by surrounding a character
sequence with quote characters. More precisely the string must start with the digraph <i><w></i>'
and end with the digraph '<i><w></i>. Within such a string characters such as _, <i><w></i>, # and " do not
have special properties. Note that the ' character may also be placed within this string
provided that it is not immediately trailed by a <i><w></i> character. The character string must
not span multiple lines.
</td></tr>
<tr><td valign="top">"</td><td> the double quote serves the same function as '.
</td></tr>
<tr><td valign="top">;</td><td> the semicolon, if used as the first character in a line,
is used to start and finish a sequence of lines, referred to as a string of type text..
The sequence newline-semicolon serves very much the same purpose as the single and
double quotes, but is the only way to provide multiple line text as a value.
</td></tr>
<tr><td valign="top">.</td><td> the period character has a special meaning when used
by itself as a data value. It usually means "the default value".
</td></tr>
<tr><td valign="top">?</td><td> the question mark character has a special meaning
when used by itself as a data value. It usually means "value unknown".
</td></tr>
<tr><td valign="top">$</td><td> the dollar sign is normally NOT a special character
in a CIF. However, if a CIF contains save frames <a href="Hall_Spadaccini_1994">[Hall, Spadaccini, 1994]</a>, the
dollar sign is used at the start of save frame names when referred to as data values.
To avoid confusion, do not use an unquoted dollar sign as a value in a CIF.
</td></tr></table>
<p>
<h3 id="1.5" align="left">1.5. Syntax control words
</h3>
<p>
Special words in a CIF control and signal syntax changes. These words can be easily
recognised by a trailing underscore character.
<p>
<table border=0>
<tr><td valign="top">
<b><tt>data_</tt></b></td><td>signals the start of a new data block. A name is appended
to this string
(e.g. <tt>data_crystal_description</tt>). Each data block name within a CIF must be unique.
<p></td></tr><tr><td valign="top">
<b><tt>loop_</tt></b></td><td>signals the start of a repeated list of data items.
<tt>loop_</tt> is followed
by the data names of all items in the list. Then come the data values, in the same order
as the data names, and these are repeated until another data name or control string is
encountered.
<p></td></tr><tr><td valign="top">
<b><tt>global_</tt></b></td><td>signals the start of a new global data block. This
serves the same function
as <tt>data_</tt> , except that it contains items are assumed to be "global" rather than
specific to a particular data block. Global data blocks do not have block names.
<p></td><tr><tr><td valign="top">
<b><tt>stop_</tt></b></td><td>signals the end of a nested list. Nested lists are
not currently used in CIFs but they are in a STAR File
<a href="Hall_Spadaccini_1994">[Hal, Spadaccini, 1994]</a>. An example of this is
shown later.
<p></td></tr><tr><td valign="top">
<b><tt>save_</tt></b></td><td>signals the start, and the end, of a save frame.
Save frames are not used in CIFs, but they are in DDL2 dictionaries. A save
frame is used as a "macro" within a data block to contain, one or more data items.
A save frame is "addressable" via a frame code, and each data name within a frame
must be unique. A data block may contain any number of save frames.
Because the identity of data items within a
save frame is "protected" from items outside this frame, the same data name may be
used in the data block or in other save frames. The <tt>save_</tt> string at the start
a frame has an appended code that is unique within the data block. This code,
preceded by a dollar character, <i>i.e.</i> $<code>, may be
referred to as a data value, so as to "point to" a specific frame of data items.
The <tt>save_</tt> string closing a frame does not have a code attached.
</td></tr>
</table>
<p>
<h3 id="1.6" align="left">1.6. File examples
</h3>
<p>
Some data file examples will be now used to illustrate syntax requirements.
<h3 id="1.6.1" align="left">1.6.1 A typical structural CIF
</h3>
<p>
Here is an abbreviated version of typical CIF.
<p>
<center><table border=1><tr><td><pre>
<b>data_</b>xtest2
_chemical_name_systematic
hexamethyl-4,8-dioxaundecanedioate)bis(pyridine)dirhodium
_chemical_formula_sum 'C40 H62 N2 O12 Rh2 '
_chemical_formula_moiety ?
_chemical_formula_weight 498.35
_symmetry_cell_setting triclinic
_symmetry_space_group_name_H-M 'P -1'
<b>loop_</b>
_symmetry_equiv_pos_as_xyz
'x,y,z' '-x,-y,-z'
_cell_length_a 8.586(8)
_cell_length_b 15.286(11)
_cell_length_c 15.606(8)
_cell_angle_alpha 94.57(4)
_cell_angle_beta 92.31(4)
_cell_angle_gamma 100.58(4)
_cell_formula_units_Z 4
_cell_volume 2004(3)
</pre></td></tr></table></center>
<p>
Note the following in this example.
<ul>
<li>The alignment of character strings in this (and any other) CIF is largely a
matter of taste. Changing the white space between data names or values does
not affect the meaning of the data; nor does any reordering of the items.
The term "item" refers to a tag/value pair.
</li>
<li>The quotes are not needed for the _symmetry_equiv_pos_as_xyz values because
they contain no embedded blanks, however, their presence does not alter the
value. Because of embedded blanks in the formula, the quotes bounding the
_chemical_formula_sum string are required. Double quotes would have worked as well.
</li>
<li>Because the value of _chemical_formula_moiety is unknown, its value is shown
as a question mark. This item (i.e. the tag and the value) could have been omitted
from the file, however, it is often convenient to retain the data name of a missing
value as a reminder that it needs to be added.
</li>
</ul>
<p>
One of the most common errors in a CIF is the omission of a missing value (i.e.
using a blank field) as this violates the requirement to match tags to values.
<h3 id="1.6.2" align="left">1.6.2 A STAR File
</h3>
<p>
Here is an example of a STAR File to illustrate its much more extensive syntax.
This file contains quantum chemical data on the water molecule. One can see that a
STAR file in most respects is identical to a CIF file.
<p>
<center><table border=1><tr><td><pre>
data_water
_qchem_chemical_name_common water
_qchem_chemical_name_IUPAC 'oxygen dihydride'
_qchem_chemical_formula 'H2 O'
loop_
_qchem_molecular_site_number
_qchem_molecular_site_label
_qchem_molecular_site_symbol
_qchem_molecular_site_x
_qchem_molecular_site_y
_qchem_molecular_site_z
_qchem_molecular_site_mass
1 O1 O 0.00000 0.00000 0.00000 15.994915
2 H1 H 0.00000 0.75753 0.58707 1.007825
3 H2 H 0.00000 -0.75753 0.58707 1.007825
_qchem_molecular_mass_centre_x 0.0000000
_qchem_molecular_mass_centre_y 0.0000000
_qchem_molecular_mass_centre_z 0.0657023
loop_
_qchem_basis_set_atom_name
_qchem_basis_set_atom_symbol
_qchem_basis_set_contraction_scheme
_qchem_basis_set_funct_per_contraction
loop_
_qchem_basis_set_function_code
_qchem_basis_set_function_count
_qchem_basis_set_function_exponent
_qchem_basis_set_function_coefficient
oxygen O (9,5,1)->[4,2,1] {6:1:1:1,4:1,1}
s 1 7816.540000 0.002031
s 1 1175.820000 0.015436
s 1 273.188000 0.073771
#...........................................data omitted for space d 7 0.900000 1.000000 stop_
hydrogen H (4,1)->[2,1] {3:1,1}
s 1 19.240600 0.032828
s 1 2.899200 0.231208
s 1 0.653400 0.817238
s 2 0.177600 1.000000
p 3 1.000000 1.000000 stop_
loop_
_qchem_bond_site_label_1
_qchem_bond_site_label_2
_qchem_bond_distance_au
_qchem_bond_distance
O1 H1 1.811095991 0.958390452 O1 H2 1.811095991 0.958390452
loop_
_qchem_angle_site_label_1
_qchem_angle_site_label_2
_qchem_angle_site_label_3
_qchem_angle
H1 O1 H2 104.44991917
_qchem_molecule_number_atoms 3
_qchem_molecule_number_electrons 10
_qchem_molecule_number_contractions 13
_qchem_molecule_charge 0
_qchem_molecule_state_multiplicity 1
_qchem_molecule_occup_orb_doub 5
_qchem_molecule_occup_orb_sing_alpha 0
_qchem_molecule_occup_orb_sing_beta 0
_qchem_option_converge_criterion 1.0E-05
_qchem_option_variable_level_shift yes
_qchem_calc_energy_electronic -85.230179266
_qchem_calc_energy_nuclear 9.183706230
_qchem_calc_energy_total -76.046473036
</pre></td></tr></table></center>
<p>
Note the following difference between this STAR file and a CIF.
<p>
• The <tt>_qchem_basis_set_</tt> items in this STAR file are in nested loop.
The <tt>_qchem_basis_set_function_</tt> items are in a level 2 loop.
Note that following the last set (or packet) of data values for these items
there is a <tt>stop_</tt> signal. This causes the nesting to revert to level 1.
<p>
<h3 id="1.7" align="left">1.7 Data definitions
</h3>
<p>
CIF data items used in global data exchange applications, such as in archiving
or publication, are usually defined in an electronic dictionary that has been
formally approved by the IUCr. In that way, those that generate CIF data have
a common understanding of what the data means with those that subsequently
read that data. The definition of data items has become quite rigorous as a
consequence of this requirement and involves special definition protocols
that are incorporated in a dictionary definition language (DDL).
<p>
Each data definition needs to specify the function of an item, and list its
particular characteristics or attributes. For instance, the definition needs
to specify if an item is a number or a character string. Although very few users
of CIF data need to understand how dictionaries and the individual definitions
are constructed, a programmer writing CIF applications will benefit greatly by
knowing about the two types of DDL currently in use, appreciating the types of
information contained within the DDL definitions, and understanding how it can
be employed to validate data.
<p>
The format of all CIF electronic dictionaries conform to the STAR syntax,
and may also be parsed with <i>CIFtbx</i> tools. In fact, the toolbox provides a
specific function to read and cross check attributes from dictionaries.
<p>
Existing dictionaries are written using two different DDLs. DDL1 has been used to
construct the CIF Core, Powder and several other dictionaries. A more recent dictionary
language, DDL2, is used to specify the macromolecular dictionary mmCIF
<a href="#Fitzgerald_et_al_1996">[Fitzgerald <i>et al.</i>, 1996]</a>.
<p>
<h3 id="1.7.1" align="left">1.7.1 DDL1 definition examples
</h3>
<p>
<h3 id="1.7.1.11" align="left">1.7.1.1 DDL1 example 1
</h3>
<p>
Here is DDL1 definition of the data items <tt>_atom_site_fract_x</tt>,
<tt>_atom_site_fract_y</tt>, and
<tt>_atom_site_fract_z</tt> from the Core dictionary.
<p>
<center><table border=1><tr><td><pre>
data_atom_site_fract_
loop_
_name
'_atom_site_fract_x'
'_atom_site_fract_y'
'_atom_site_fract_z'
_category atom_site
_type numb
_type_conditions esd
_list yes
_list_reference '_atom_site_label'
_enumeration_default 0.0
_definition
; Atom site coordinates as fractions of the _cell_length_ values.
;</pre></td></tr></table></center>
<p>
The precise meanings of the different DDL1 attributes such as
<tt>_name</tt>, <tt>_category</tt>, <i>etc.</i> are given in
<a href="#Hall_Cook_1995">[Hall, Cook, 1995]</a>
and <a href="McMahon_1995">[McMahon 1995]</a>.
Note the following in this definition.
<p>
<ul>
<li>
when data items form an irreducible set, such as with the fractional
coordinates x, y, and z, or the diffraction indices h, k, and l, they are
defined in the same DDL1 data block. In DDL2 each data item is defined separately.
</li>
<li>
the <tt>_list</tt> attribute tells us that the fractional coordinates must be
present in a looped list of category <tt>atom_site</tt>.
</li>
<li>
the <tt>_list_reference</tt> attribute specifies that the data item
<tt>_atom_site_label</tt> must be present in the same looped list as the fractional
coordinates for the list of category atom_site items to be valid.
</li>
<li>
the <tt>_type_conditions attribute</tt> places the condition esd on the <tt>_type</tt> value of
numb. This means that fractional coordinate numbers may have the estimated
standard deviation (i.e. standard uncertainty) values appended within parentheses.
</li>
<li>
the <tt>_enumeration_default</tt> attribute defines the value that a fractional coordinate
is assumed to have, if it is missing from a CIF.
</li>
</ul>
<p>
<h3 id="1.7.1.2" align="left">1.7.1.2 DDL1 example 2
</h3>
<p>
Here is the DDL1 definition of the data item <tt>_atom_site_label</tt>. This is the
item referred to above as the <tt>_list_reference</tt> data that must be present
in a list of items of category <tt>atom_site</tt>, in order that the CIF be valid.
<p>
<center><table border=1><tr><td><pre>
data_atom_site_label
_name '_atom_site_label'
_category atom_site
_type char
_list yes
_list_mandatory yes
loop_
_list_link_child
'_atom_site_aniso_label'
'_geom_angle_atom_site_label_1'
'_geom_angle_atom_site_label_2'
'_geom_angle_atom_site_label_3'
'_geom_bond_atom_site_label_1'
'_geom_bond_atom_site_label_2'
loop_
_example
C12
Ca3g28
Fe3+17
H*251
boron2a
C_a_phe_83_a_0
Zn_Zn_301_A_0
_definition
; The _atom_site_label is a unique identifier for a particular site in the crystal.
;</pre></td></tr></table></center>
<p>
Note the following in this definition.
<ul>
<li>the attribute <tt>_list_mandatory</tt> with a value of yes signals that this item
must be present in any list of category <tt>atom_site</tt>.</li>
<li>the <tt>_list_link_child</tt> attributes specify data items that are 'child'
dependencies of <tt>_atom_site_label</tt>. This means that this item
must be present in the CIF if any of the dependent items is present.</li>
<li>the <tt>_list_reference</tt> attribute specifies that the data item
<tt>_atom_site_label.</tt></li>
</ul>
<p>
<h3 id="1.7.1.3" align="left>1.7.1.3 DDL1 example 3
</h3>
Here is a more complicated DDL1 definition for the six anisotropic atomic
displacement parameters U<sup>ij</sup>.
<p>
<center><table border=1><tr><td><pre>
data_atom_site_aniso_U_
loop_ _name '_atom_site_aniso_U_11'
'_atom_site_aniso_U_12'
'_atom_site_aniso_U_13'
'_atom_site_aniso_U_22'
'_atom_site_aniso_U_23'
'_atom_site_aniso_U_33'
_category atom_site
_type numb
_type_conditions esd
_list yes
_list_reference '_atom_site_aniso_label'
_related_item '_atom_site_aniso_B_'
_related_function conversion
_units A^2^
_units_detail 'angstroms squared'
_definition
; These are the standard anisotropic atomic displacement
components in angstroms squared which appear in the
structure factor term:
T = exp{-2pi^2^ sum~i~ [sum~j~ (U^ij^ h~i~ h~j~ a*~i~
a*~j~) ] }
h = the Miller indices
a* = the reciprocal-space cell lengths
The unique elements of the real symmetric matrix are
entered by row.
;</pre></td></tr></table></center>
<p>
Note the following aspects of this definition.
<ul>
<li>the <tt>_related_item</tt> attribute identifies items that are related to the
defined one. The nature of this relationship is specified with
<tt>_related_function</tt>. In this case the value is <tt>conversion</tt>, which means
that the U<sup>ij</sup> can be derived directly from the B<sup>ij</sup>.</li>
<li>the <tt>_units</tt> attributes specify the units or dimensions of the U<sup>ij</sup> in
Ångstroms squared.
</ul>
<p>
<h3 id="1.7.2" align="left">1.7.2 DDL2 definition examples
</h3>
<p>
The definitions shown above are from the CIF Core dictionary and illustrate how the
DDL1 attributes are used to define data items. The DDL1 approach makes minimum use
of the 'category' of data items, such as <tt>atom_site</tt>. In a sense this is
inefficient because data attributes such as <tt>_list</tt>, <tt>_list_reference</tt>,
<tt>_list_link_child</tt>, <tt>_list_link_parent</tt>, refer to properties of
the class or category rather than to individual items. The DDL2 approach
<a href="Westbrook_Hall_1995">[Westbrook, Hall, 1995]</a> uses a more
hierarchical approach to data classes in which data items of a particular category are
organized into a single table. The DDL2 also provides for explicit sub-categories in
which data items are identified by function, e.g. 'matrix'. Although this approach is
less intuitive to the casual user, it has proven to be advantageous in defining complex
data relationships, such as those in the macromolecular dictionary, and is therefore
expected to be of increasing importance in the future as cross-discipline data bases develop.
<p>
<h3 id= "1.7.2.1" align="left">1.7.2.1 DDL2 example 1
</h3>
<p>
To illustrate the differences between the dictionary approaches, we shall now look at the
<tt>_atom_site</tt> definitions in DDL2.
<p>
<center><table border=1><tr><td><pre>
save__atom_site.fract_x
_item_description.description
; The x coordinate of the atom site position specified as a
fraction of _cell.length_a.
;
_item.name '_atom_site.fract_x'
_item.category_id atom_site
_item.mandatory_code no
_item_aliases.alias_name '_atom_site_fract_x'
_item_aliases.dictionary cif_core.dic
_item_aliases.version 2.0.1
loop_
_item_dependent.dependent_name
'_atom_site.fract_y'
'_atom_site.fract_z'
_item_related.related_name '_atom_site.fract_x_esd'
_item_related.function_code associated_esd
_item_sub_category.id fractional_coordinate
_item_type.code float
_item_type_conditions.code esd
save_
</pre></td></tr></table></center>
<p>
Note the following aspects in this definition.
<ul>
<li>DDL2 definitions are enclosed in a save frame, not a data block.</li>
<li>DDL2 data names contain a dot '.' character that separates the category
(starting the name) from the identity (ending the name).</li>
<li>in DDL2 definitions each data item, independent of its irreducible relationship
to other data items, is defined separately.</li>
<li>DDL2 data names are equivalenced to other identical data items, including
the DDL1 defined names, with the attributes
<tt>_item_aliases.alias_name</tt> values.</li>
<li>in DDL2 the <tt>_item_type.code</tt> attribute is identical to the DDL1 _type attribute,
except that it has a more detailed enumeration <i>e.g.</i> number has been expanded to integer,
float, etc.
</ul>
<p>
<h3 id="1.7.2.2" align="left">1.7.2.2 DDL2 example 2
</h3><p>
Here is another DDL2 definition to emphasise the differences in definition approach.
<p>
<center><table border=1><tr><td><pre>
save__atom_site.aniso_U[1][3]_esd
_item_description.description
; The estimated standard deviation of
_atom_site.aniso_U[1][3].
;
_item.name '_atom_site.aniso_U[1][3]_esd'
_item.category_id atom_site
_item.mandatory_code no
_item_default.value 0.0
loop_
_item_related.related_name
_item_related.function_code '_atom_site.aniso_U[1][3]'
associated_value
'_atom_site.aniso_B[1][3]_esd'
conversion_constant
'_atom_site_anisotrop.B[1][3]_esd'
conversion_constant
'_atom_site.aniso_B[1][3]_esd'
alternate_exclusive
'_atom_site_anisotrop.B[1][3]_esd'
alternate_exclusive
'_atom_site_anisotrop.U[1][3]_esd'
alternate_exclusive
_item_sub_category.id matrix
_item_type.code float
_item_units.code angstroms_squared
save_
</pre></td></tr></table></center>
<p>
Note the following in this definition.
<ul>
<li>DDL2 defines the esd (or su) of U<sup>13</sup> as a separate data item, whereas in
DDL1 the su is assumed to be appended to the value.</li>
<li>an additional classification attribute <tt>_item_sub_category.id</tt>
is defined in DDL2.</li>
<li>in DDL2 definitions each data item, independent of its irreducible
relationship to other data items, is defined separately.</li>
</ul>
<h3 id="1.7.2.3" align="left">1.7.2.3 DDL2 example 3
</h3>
<p>
Finally here is how the properties of the category <tt>atom_site</tt> are defined in DDL2.
<p>
<center><table border=1><tr><td><pre>
save_ATOM_SITE
_category.description
; Data items in the ATOM_SITE category record details about
the atom sites in a macromolecular crystal structure,
such as the positional coordinates, atomic displacement
parameters, magnetic moments and directions, and so on.
The data items for describing anisotropic temperature or
thermal displacement factors are only used if the
corresponding items are not given in the
ATOM_SITE_ANISOTROP category.
;
_category.id atom_site
_category.mandatory_code no
_category_key.name '_atom_site.id'
loop_
_category_group.id 'inclusive_group'
'atom_group'
save_</pre></td></tr></table></center>
<p>
Note the following aspects in this category definition.
<ul>
<li>in DDL2 the attribute <tt>_category_key.name</tt>, which is equivalent to
the DDL1 <tt>_list_reference</tt>, is defined only once, whereas in DDL1 must
be declared in the definition of each data item.</li>
<li>the category attributes are identified by the name structure _category_ as
opposed to the <tt>_item_</tt> prefix used to define data items. It is important
to emphasise that <tt>atom_site</tt> is NOT a data item and will not appear in a CIF.
</ul>
<p>
<h3 id="1.8" align="left">1.8 Handling DDL1 and DDL2 name structures
</h3>
<p>
The different naming structures in the two dictionary languages, DDL1 and DDL2,
appears to complicate the use of CIFs. This is avoided because the <i>CIFtbx</i>
toolbox handles these naming convention transparently and interchangeably
provided there is access to the relevant dictionaries.
<p>
We shall now look quickly at some data items expressed in both conventions.
Here is an extract of a CIF containing core data items.
<p>
<center><table border=1><tr><td><pre>
loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_U_iso_or_equiv
_atom_site_thermal_displace_type
_atom_site_calc_flag
_atom_site_calc_attached_atom
O1 .4154(4) .5699(1) .3026(0) .060(1) Uani ? ?
C2 .5630(5) .5087(2) .3246(1) .060(2) Uani ? ?
C3 .5350(5) .4920(2) .3997(1) .048(1) Uani ? ?
N4 .3570(3) .5558(1) .4167(0) .039(1) Uani ? ?
C5 .3000(5) .6122(2) .3581(1) .045(1) Uani ? ?
loop_
_atom_site_aniso_label
_atom_site_aniso_U_11
_atom_site_aniso_U_22
_atom_site_aniso_U_33
_atom_site_aniso_U_12
_atom_site_aniso_U_13
_atom_site_aniso_U_23
_atom_site_aniso_type_symbol
O1 .071(1) .076(1) .0342(9) .008(1) .0051(9) -.0030(9) O
C2 .060(2) .072(2) .047(1) .002(2) .013(1) -.009(1) C
C3 .038(1) .060(2) .044(1) .007(1) .001(1) -.005(1) C
N4 .037(1) .048(1) .0325(9) .0025(9) .0011(9) -.0011(9) N
C5 .043(1) .060(1) .032(1) .001(1) -.001(1) .001(1) C
</pre></td></tr></table></center>
<p>
In this CIF, the anisotropic atomic displacement parameters have been looped in
a separate list from the atomic coordinates. The each row in the second list is
linked to a row in the list of atomic coordinates by the value of
<tt>_atom_site_aniso_label</tt> that matches the value of <tt>_atom_site_label</tt>
in the associated row of the list of atomic coordinates. Though it is customary to
align the ordering of the two lists, CIF does not require them to be in the same
order, only that the labels can be matched. Alternatively, the two lists could have
been merged into one, using the same tags.
<p>
In the DDL2-based mmCIF dictionary there are two alternate sets of names for
presentation of anisotropic atomic displacement parameters, one set in the
<tt>atom_site</tt> category, and another set in a distinct <tt>atom_site_anisotrop</tt>
subcategory. In a CIF one set of names can be used but not both. If the names
from the parent category are used they must be combined with these items. An
atomic coordinate list with anisotropic displacement parameters merged into
the same list in mmCIF would look like this.
<p>
<center><table border=1><tr><td><pre>
loop_
_atom_site.label_seq_id
_atom_site.auth_asym_id
_atom_site.group_PDB
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.auth_seq_id
_atom_site.label_alt_id
_atom_site.cartn_x
_atom_site.cartn_y
_atom_site.cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.footnote_id
_atom_site.label_entity_id
_atom_site.id
_atom_site.aniso_U[1][1]
_atom_site.aniso_U[1][2]
_atom_site.aniso_U[1][3]
_atom_site.aniso_U[2][2]
_atom_site.aniso_U[2][3]